
Automating Program Analysis: Abstract Interpretation and Parsing
Explore the concepts of automating program analysis through abstract interpretation and parsing techniques. Abstract interpretation aids in determining possible program executions without running them, while automating parsing involves creating parsing functions for context-free grammars. Discover the challenges and strategies for automating program analysis in this insightful discussion.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Automating Abstract Interpretation Mooly Sagiv Adapted from Thomas Reps VMCAI 2016 Invited Talk
Static Analysis in a Nutshell Determine information about the possible situations that can arise at execution time, without actually running the program on specific inputs Typically: For each point in the program, find a descriptor that represents (a superset of) the stores that could possibly arise at that point Correctness of an analysis justified via abstract interpretation [Cousot & Cousot 77]
Automating Abstraction Interpretation Abstract interpretation A black art hard to work with 20-year quest to raise the level of automation in abstract interpretation 3-valued logic analysis (TVLA) M. Sagiv, T. Reps, R. Wilhelm, T. Lev-Ami, A. Loginov, R. Manevichs machine-code analysis (TSL) T. Reps, J. Lim symbolic-abstraction algorithms T. Reps, M. Sagiv, G. Yorsh, A. Thakur Reps, T. and Thakur, A., Automating abstract interpretation, VMCAI, 2016. research.cs.wisc.edu/wpis/papers/vmcai16-invited.pdf Patrick Cousot Radhia Cousot
What Does It Mean to Automate Parsing? A parsing-problem instance Parse(L,s) has two inputs L = a context-free language s = a string to be parsed The string changes more frequently than the language A context-free language has a context-free grammar Yacc (and later, Gnu Bison) Input: a context-free grammar that describes the language L to be parsed Output: a parsing function, yyparse(), for which executing yyparse() on string s computes Parse(L,s) Steve Johnson source: simple-talk interview
What Does It Mean to Automate Program Analysis? Follow a similar scheme . . . But first, why would you even want to invest the time doing so?
Why is Program Analysis Difficult? Undecidable problems Just before statement S of program P, does property hold? Dodge this issue by using one-sided algorithms
Sidestepping Undecidability Bad States Reachable States Universe of States
Sidestepping Undecidability Overapproximate the reachable states Bad States Reachable States False positive! Universe of States
Why is Program Analysis Difficult? Undecidable problems Just before statement S of program P, does property hold? Dodge this issue by using one-sided algorithms Infinities Infinite, or very large, state spaces Infinite, or very large, answer sets
Why is Program Analysis Difficult? Large/unbounded base types: int, float, string User-defined types/classes Pointers/aliasing + unbounded # s of heap-allocated cells Procedure calls/recursion/calls through pointers/dynamic method lookup/overloading Concurrency + unbounded # s of threads
Sources of Infinity Data unbounded counters, integer variables, lists, queues Control structures procedures, process creation Configuration parameters unbounded number of processes, principals Real-time discrete or continuous time
Some Successes of the Field Static Driver Verifier, a.k.a. SLAM (Microsoft) Tool for finding possible bugs in Windows device drivers Complicated back-out protocols in driver APIs when events cancelled or interrupted Astr e (ENS) Established the absence of run-time errors in Airbus flight software
Example: Parity Analysis f (a,b) = (16 * b + 3) * (2 * a + 1) * + + * 3 * 1 16 b 2 a + 0 1 2 3 . . . * 0 1 2 3 . . . 0 0 1 2 3 . . . 0 0 0 0 0 . . . 1 1 2 3 4 . . . 1 0 1 2 3 . . . 2 2 3 4 5 . . . 2 0 2 4 6 . . . 3 3 4 5 6 . . . 3 0 3 6 9 . . .
Example: Parity Analysis ?#(a,b) = (16 #b+# 3) # (2 #a+# 1) O # O O +# +# O E O E # # 3 1 E ? E ? 16 b 2 a +#? ? O E # ? O E O E ? E O O E ? ? ? E O E ? O E E E ? ? ? ? E ?#: __ O
Abstract values, such as O, E, and ?, represent potentially infinite collections of concrete values O: { , -3, -1, 1, 3, } E: { , -2, 0, 2, } { , -3, -1, 1, 3, } + { , -2, 0, 2, } = { , -3, -1, 1, 3, } O +# E = O +#? ? O E # ? O E O E ? E O O E ? ? ? E O E ? O E E E ? ? ? ? E
Constant Propagation [i ?, j ?] i = 0 e.e[i 0] [i 0, j ?] j = 0 e.e[j 0] [i 0, j 0] [i 1, j 0] while i 2 e.e e.e [i 0, j 0] j = (j+1)/4 e.e[j (e(j)+1)/4] [i 0, j 0] e.e[i e(i) + 1] i = i+1 [i 0, j 0] printf(i,j)
Constant Propagation [i ?, j ?] i = 0 e.e[i 0] [i 0, j ?] j = 0 e.e[j 0] [i ?, j 0] [i ?, j 0] while ( ) e.e e.e [i 0, j 0] [i ?, j 0] j = (j+1)/4 i { ,-2,-1,0,1,2, } j {0} e.e[j (e(j)+1)/4] [i 0, j 0] [i ?, j 0] e.e[i e(i) + 1] i = i+1 [i 0, j 0] [i ?, j 0] printf(i,j)
What Does It Mean to Automate Abstract Interpretation? An abstract interpreter Interp#(Ms,A,?#) has three inputs Ms = the meaning function for a programming-language statement s A = an abstract domain ?#= an abstract-domain value (represents a set of pre-states) ?# changes more frequently than Ms and A Goal Inputs: (i) a specification of the semantics of a programming language s statements (ii) a specification of abstract domain A Output: a function Is,A( ) such that Is,A(?#) computes Interp#(Ms,A,?#) What formalism should we use to specify Ms? What formalism should we use to specify A?
Abstract Interpretation [CC77] {(2,1), (2,2), (2,3), (3,1), (3,2), (3,3), (4,1), (4,2), (4,3), (5,1), (5,2), (5,3)} {(x 2, y 1), (x 5, y 3)} x [2,5] y [1,3] Universe of States Patrick Cousot Radhia Cousot
Best Transformer [CC79] However, no algorithms to apply the best transformer create the best transformer Best over-approximation of ( (?#)) in A Loss of precision safe # # ?# Universe of States Patrick Cousot Radhia Cousot
Why do we need algorithms for best transformers? Enables parametric semantics X86 Libraries Domain constructors Reduced products Basic blocks and Loop free code Simpler abstract domains
Challenge: Abstract Interpretation is Inherently Non-Compositional [-5,5] + [-10,-5] [5,10] x In computer science, we rely on compositionality languages are expressed using context-free grammars many concepts and properties defined using inductive definitions recursive tree traversals are a basic workhorse software organized into layers Example: (x + ( x)), evaluated in (x [5,10], y [10,20]) [-5, 5] versus [0,0] Suppose that you have in hand a collection of ``best abstract- interpretation operators Their composition may not provide the best (abstract) answer for the composition of the corresponding concrete operations [5,10] x
Predicate Abstraction Verify the safety of a program over infinite data using fixed set of predicates (Booleans) P1, P2, , Pn The meaning of each predicate is a function from the states into Booleans Pi: {0, 1} The program can be conservatively represented by a Boolean program If safety property holds in the Boolean program then it also holds in original program
A Simple Example X := 0 ; while true do { X := X + 1; assert X > 0 } P1 = X 0 P2= X 0 <P1 := 1, P2 := 0> while true do { <P1 := if P1 P2 then 1 else *, P2 := if P1 then 1 else *> assert P1 P2 }
A Simple Example (Transition System) X=0 X=1 X=3 X=4 X= X:=0 X:=X+1 X:=X+1 X:=X+1 X:=X+1 Concrete P1 = X 0 P2= X 0 X:=X+1 # <P1=1, P2:=0> <P1=1, P2:=1> X:=0# X:=X+1# Abstract
Predicate Abstraction: Basics Error Initial Program State Space Abstraction Abstraction: Predicates Signs: Aliasing: Predicates on program state x > 0 &x x > 0 &x &y &y States satisfying the same predicates are equivalent Merged into single abstract state
(Predicate) Abstraction: A crash course Error Initial Program State Space Abstraction Q1: Which predicates : Which predicates are required to verify a property ? Q2: How to compute abstract transformers? : How to compute abstract transformers?
The Predicate Abstraction Domain Fixed set of predicates Pred The meaning of each predicate pi Pred is a closed first order formula fi The relational domain is <P(P(Pred)), , P(Pred), , > Join is set union
A Simple Example Predicates: p1 = x > 0 p2 = y 0 int x, y; x = 1; y = 2 ; while (*) do { x = x + y ; } assert x > 0; bool p1, p2; p1 = true ; p2 = true ; while (*) do { p1 = (p1&&p2 ? 1 : *) } assert p1 ;
Existential Abstraction Given a transition system M=(S, S0, T) and an abstraction : S S# An abstract a transition system M=(S#, S0#, T#) is an existential abstraction of M w.r.t. if s0 S0. (s0) = s0# s0 S0# (s, s ) T. (s) = s# . (s ) = s # (s#, s #) T#
Minimal Existential Abstraction Given a transition system M=(S, S0, T) and an abstraction : S S# An abstract a transition system M=(S#, S0#, T#) is a minimal existential abstraction of M w.r.t. if s0 S0. (s0) = s0# s0 S0# (s, s ) T. (s) = s# . (s ) = s # (s#, s #) T# But how does one compute minimal abstraction? Employ a SAT solver
The SAT Problem Given a propositional formula (Boolean function) =(a b) ( a b c) Determine if is valid Determine if is satisfiable Find a satisfying assignment or report that such does not exit For n variables, there are 2n possible truth assignments to be checked But many practical tools exist a 0 1 b b c c c c 0 1 1 0 1 1 0 0 0 1 0 1
SAT made some progress 100000 10000 1000 Vars 100 10 1 1960 1970 1980 1990 2000 2010 Year
The SMT Problem (Sat Modulu Theory) Given a quantifier free first order formula over some theory equation =3x + y = z f(y) = z Determine if is valid Determine if is satisfiable Find a satisfying assignment or report that such does not exit Tools exist Z3 (Microsoft) CVC
Representing States as Formulas [F F] states satisfying F F F FO formula over prog. vars F {s | s {s | s F F } } [F F1 1] [F F2 2] [F F1 1] [F F2 2] [F F] F F1 1 F F2 2 F F1 1 F F2 2 F F F F1 1 implies F F2 2 [F F1 1] [F F2 2] i.e. F F1 1 F F2 2 unsatisfiable
A Simple Example (Again) X := 0 ; while true do { X := X + 1; assert X > 0 } P1 = X 0 P2= X 0 How can the SMT solver be used to compute the effect of X := X + 1 on P1 and P2?
Symbolic Operations: Three Value-Spaces T T Abstract Values Concrete Values Formulas
Symbolic Operations: Three Value-Spaces T# T Abstract Values Concrete Values Formulas
Symbolic Operations: Three Value-Spaces 2, 4, 16, x=E even(x) Abstract Values Concrete Values Formulas
Symbolic Operations: Three Value-Spaces x ... x x u u1 Abstract Values Concrete Values Formulas
Required Primitive Operations Abstraction (S) = store S (store) ( ) = { } x x u u1 Symbolic concretization ( ) = v1,v2 : nodeu1(v1) nodeu(v2) v1 v2 v : nodeu1(v) nodeu(v) . . . Theorem prover returning a satisfying structure (store) S x u u1
Constant-Propagation Domain T (Var ZT) , where ZT = . . . . . . - -2 2 - -1 0 1 2 . . . 1 0 1 2 . . . Examples: , [x [x [x [x [x [x 0, y 0, y T, y T, y T, y T, y 43, z 43, z T, z T, z T, z T, z 0] 0], 0] 0], T] T] Infinite cardinality, but finite height
Three Value-Spaces [x 0, y 1, z 0] [x [x 0, y 0, y T, z , z 0] 0] (x = 0) (z = 0) [x 0, y 0, z 0] [x 0, y 2, z 0] Abstract Values Concrete Values Formulas
Three Value-Spaces [x 0, y 1, z 0] (x = 0) (z = 0) [x 0, y 0, z 0] [x [x 0, y 0, y 2, z 2, z 0] 0] [x 0, y 2, z 0] Abstract Values Concrete Values Formulas
Required Primitive Operations Abstraction (S) = store S (store) 0, y 0, y 2 2, z ([x 0, y 2, z 0]) = [x [x , z 0 0] ] Symbolic concretization ([x [x 0, y 0, y T, z , z 0] 0]) = (x = 0) (z = 0) Theorem prover returning a satisfying structure (store) S [x 0, y 2, z 0] (x = 0) (z = 0)
Required Primitive Operations Abstraction (S) = store S (store) 0, y 0, y 2 2, z ([x 0, y 2, z 0]) = [x [x , z 0 0] ] Symbolic concretization ([x [x 0, y 0, y T, z , z 0] 0]) = (x = 0) (z = 0) Theorem prover returning a satisfying structure (store) S [x 0, y 2, z 0] (z = 0) (x = y*z)
Constant Propagation [x 3, y 4, z 1] x = y * z e.e[x e(y)*e(z)] T[x = y * z] [x 4, y 4, z 1] T[x := y*z] =df (x = y * z) (y = y) (z = z) [x 3, y 4, z 1, x 4, y 4, z 1] (x = y * z) (y = y) (z = z)