Understanding Dataflow Analysis in Program Optimization

Slide Note
Embed
Share

Dataflow analysis is crucial for optimizing programs by identifying opportunities for improvements like constant propagation and dead code elimination. This analysis involves modeling values of interest, formulating fixpoint equations, and solving them to enhance program efficiency. Concepts such as statement definitions, uses, live variables, and use/def for each 3-address operation play a vital role in this process. Aliasing, dataflow information, and program points are also key aspects in the realm of program optimization.


Uploaded on Oct 05, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Dataflow Analysis 1

  2. Overview To perform optimizations like constant propagation or dead code elimination, we must Analyze program to find opportunities for performing optimizations safely Transform program Analysis is called dataflow analysis Model values of interest as domain Formulate system of fixpoint equations whose unknowns are the required information at different points in program Solve system of equations Later we will study other types of program analyses 2

  3. Some concepts A statement is a definition of a variable v if it may write to v. A statement is a use of variable v if it may read from v. A variable v is live at a point p in a CFG if there is a path from p to END on which there is a use of v before any definition of v. {x,y,z,d,c} if (c) x = y+1 y=2*z if(d) {x} x=y+z z=1 z=x {} 3

  4. Use/Def for each 3-address op if I is x = y OP z : use[I] = {y, z} def[I] = {x} if I is x = OP y : use[I] = {y} if I is x = y : use[I] = {y} if I is if (x) : use[I] = {x} if I is return x : use[I] = {x} if I is x = f(y1 , , yn) : use[I] = {y1, , yn} def[I] = {x} def[I] = {x} def[I] = {} def[I] = {} def[I] = {x} 4

  5. Aliasing Two or more program names for the same memory location Can happen if you have pointers or call-by-reference parameters x = 3 y = @x *y = 5 x x and *y are aliases Aliasing complicates def/use computation We will assume no aliasing for now so variable name is enough to determine defs and uses

  6. Dataflow information Facts about program execution that are useful for optimization at compile-time Examples: Is the value of x always a constant at this point of the program? If so, what is the value? Analysis: constant propagation Given a use of x in a statement, which assignment statements for x could have written the value at that use? Analysis: reaching definitions Is this computation of x+y redundant because it has been computed earlier in the program? Analysis: common subexpression elimination

  7. Program Points Two program points for each instruction: There is a program point before each instruction There is a program point after each instruction Point before x = y+1 Point after In a basic block: Program point after an instruction = program point before the successor instruction 7

  8. Program Points: Example Multiple successor blocks means that point at the end of a block has multiple successor program points x = y+1 y =2*z if (d) Depending on the execution, control flows from a program point to one of its successors x = y+z z = 1 8

  9. Reaching definitions A definition d of variable v is said to reach a point p if there is a path from START to p that contains d there are no other definitions of v on that path after d Convention: START has a definition for all variables of interest How do we compute reaching definitions at each point of program? To keep things simple, we consider only one variable in general, you would perform analysis for all variables of interest START {START} d1 a:= . {d1} if p(a) d2 d3 a:= . {d2} a:= . {d3} a... {d2,d3}

  10. One solution: meet-over-paths To find definitions that reach a point p Enumerate all paths from START to p For each path, find the definition that reaches p Compute the union of these definitions This is called the meet-over-paths (MOP)solution Confluence operation: How do we combine information from different paths to a given point? Problem: If program has loops, set of paths to p will be unbounded Better idea: dataflow equations START {START} d1 a:= . {d1} if p(a) d2 d3 a:= . {d2} a:= . {d3} a... {d2,d3}

  11. Dataflow Equations Dataflow equations System of fixpoint equations in which unknowns are solutions to the dataflow problem at different points in program Solve the system of equations as described in previous lecture For many problems, this gives the MOP solution and for other problems, it gives a safe approximation to MOP solution Safe approximation to MOP solution Not as precise as MOP solution but will not lead to incorrect optimization Example: an analysis that says no variable is a constant anywhere in the program is a safe approximation to MOP Granularity of dataflow equations Statement-level: unknowns are associated with input or output points of each statement Basic-block level: more common START {START} d1 a:= . {d1} if p(a) d2 d3 a:= . {d2} a:= . {d3} a... {d2,d3}

  12. Reaching definitions: domain START x0 d1 a:= . x1 if p(a) x2 Domain is Power-set ({START,d1,d2,d3}) x2 d2 d3 a:= . x3 a:= . x4 a... x5 Unknowns: reaching definitions at output of each statement. Solution at each point will be some element of domain.

  13. Reaching definitions: equations START Give a name to output of each statement x0 d1 For each statement, write down equation for output as a function of its input(s). a:= . x1 Rule: if p(a) x2 x2 xi1 xi2 dn d2 d3 (v==a): equation is xo = {dn} otherwise: xo = xi1 U xi2 v:= . a:= . x3 a:= . xo x4 x0 = {START} x1 = {d1} x2 = x1 x3 = {d2} x4 = {d3} x5 = x3 U x4 a... x5

  14. Reaching definitions: multiple variables You can solve for one variable at a time, but it is better to solve for all variables at the same time Assignment to variable x will kill all definitions of x that reach input of assignment but will pass through all other assignments generate itself at output Equation: Out = (In Kill) U Gen Notation: Dx = set of definitions of variable x xi1 xi2 dn Dataflow equation: xo = ((xi1 U xi2) Dv) U {dn} v:= . xo

  15. More complicated example START 1 x1 = {START} x2 = x1 U x7 x3 = (x2 {d6,d3}) U {d3} x4 = (x3 {d4}) U {d4} x5 = x4 x6 = (x5 {d3,d6}) U {d6} x7 = ((x6 U x5) {d7,d8}) U d7 x8 = (x2 {d7,d8}) U {d8} 2 if (c) x = y+1 3 y=2*z 4 if(d) 5 x=y+z 6 7 z=1 z=x 8

  16. Observations System of equations can be solved to find least solution Iterative method for solving equations will converge even though there are an unbounded number of paths in program Intuitively we need to consider only a finite number of paths to find solution to reaching definitions problem since longer paths do not give us more information In general, when you have loops, the system of equations will have multiple solutions. Exercise: convince yourself using the example in the last slide Which one do we want? We want the least solution. So in iterative solution, initialize all unknowns to {}.

  17. Classification of dataflow problems Direction of information propagation Forward-flow: information propagates in the direction of control-flow from input to output Backward-flow: information propagates in reverse direction of control-flow from output to input All-paths vs. any-path All-paths problem: dataflow fact is true at some point in program if it is true along all paths to/from that point Any-path problem: dataflow fact is true at some point in program if it is true along any path to/from that point Classification of reaching definitions? START p END 17

  18. Examples All-paths Any-path Constant propagation Available expressions Reaching definitions Forward Anticipatable expressions Live variables Backward START p END 18

  19. Recall START {x,y,z,d,c} if (c) A variable v is live at a point p in a CFG if there is a path from p to END on which there is a use of v before any definition of v x = y+1 We want to compute the set of variables live at each point in program y=2*z Backward-flow problem: information at p depends on paths from p to END if(d) Any-path problem: variable is live at p if there is any path from p to END that satisfies the condition x=y+z z=1 Domain: power-set of set of variables in procedure {x} z=x {} END

  20. Straight-line code Relation between Live sets: Live1 = ( Live2-{x} ) {y} Live2 = ( Live3-{y} ) {z} Live3 = ( Live4-{} ) {d} Live1 x = y+1 Live2 y =2*z Live3 if (d) Live4 I1 I2 Relation: in[I] = ( out[I] def[I] ) use[I] I3 Information flows backward! in[I] I Instructions: can compute in[I] if we know out[I] out[I] 20

  21. Analyze Control Flow A variables is live at end of block B if it is live at the beginning of one (or more) successor blocks Characterizes all possible program executions Mathematically: out[B] = in[B ] B out[B] in[B1] B1 in[Bn] Bn B succ(B) Confluence operator is union Compute least solution 21

  22. Live variables START {x,y,z,d,c} if (c) L10 = { } L3 = {x} U (L10- {z}) L2 = L4 U L3 U {c} L7 = L2 {z} L8 = {y,z} U (L7 {x}) L6 = L8 U L7 U {d} L5 = {z} U (L6 {y}) L4 = {y} U (L5 {x}) 2 {x,y,z,d,c} x = y+1 {x,z,d,c} 4 y=2*z 5 {x,y,z,d,c} 6 if(d) {y,z,d,c} {x,y,d,c} 8 x=y+z {x,y,d,c} 7 z=1 {x} 3 z=x {} 10 END

  23. Available expressions An expression x op y is available at p if every path from START to p contains an evaluation of x op y after which there are no assignments to x or y. Classification: forward-flow, all-paths Domain: power-set of expressions of interest Equation for x = y op z : Out = (In Ex) U { y op z } Confluence operator: intersection Compute greatest solution: start by assuming all expressions are available everywhere except at START and iterate START START if (c) if (c) z = x+y t = x+y z=t w=x+y t=x+y w=t B5 B5 x+y.. t.. Computation in B5 is totally redundant .

  24. Anticipatable expressions An expression x op y is anticipatable at p if every path from p to END contains an evaluation of x op y before any assignment to x or y. Classification: backward-flow, all- paths Domain: power-set of expressions of interest Equation for x = y op z : In = (Out Ex) U { y op z } Confluence operator: intersection Compute greatest solution: start by assuming all expressions are anticipatable everywhere except at END and iterate START START if (c) if (c) z = x+y t = x+y z=t t=x+y B5 B5 x+y.. t.. Computation in B5 is partially redundant . in B5 is removed. Partial redundancy

  25. Constant propagation A variable x is a constant c at a point p if x has the value c at that point on all paths from START to that point. Classification: forward-flow, all-paths Perform constant propagation on set of variables V Dataflow information at point p: vector of size |V| where each value comes from domain Equation for statement x = e OUT = IN [(Eval(e) in IN) / x] Confluence operation: join Initialize all vectors to [ , , ,..] except at START where it is [T,T,..T] Definitely not constant T . -2 -1 0 1 2 May or may not be constant Domain for constant propagation

  26. Constant propagation (contd.) Solution to dataflow equations does not give meet-over-paths solution in general Gives a safe approximation Example on left: In B5, x+y is constant 5 but dataflow solution will give T, which is a safe approximation START [T,T] if (c) [T,T] [T,T] x = 2 y = 3 x = 3 y = 2 [3,2] [2,3] [T,T] x+y.. B5 Dataflow vector: [x,y]

  27. Implementing constant propagation Using vectors to represent values of variables is inefficient Assignment is to one variable but values of other variables must be copied from input vector to output vector Idea: can we use def-use chains to perform constant propagation? If we have an assignment x = 3 , find all uses reached only by this definition and replace use with constant 3. This works but it may not find all the constants that the CFG algorithm does Solution: use a variant of def-use chains called the static single assignment form (SSA) See later IN x = e OUT OUT = IN [(Eval(e) in IN) / x]

  28. SOLVING DATAFLOW EQUATIONS USING ELIMINATION

  29. High-level idea Solving systems of linear equations Iterative methods: Jacobi, Gauss-Seidel,.. Elimination-based methods: Gaussian elimination Dataflow equations We have seen iterative method For structured programs, we can use elimination- based methods and avoid iteration For unstructured programs, elimination can be used to reduce the number of equations given to iterative solver

  30. Intuitive idea of elimination Reaching-definitions Basic-block level equations .. Out[B5] = Out[B4] Out[B6] = {d7} U (Out[B5]- Dz) Out[B7] = {d9} U (Out[B5]- Dz) .. Replace equations with this .. Out[F] = {d7,d9} U (Out[B4]-Dz) Solve equations iteratively and then interpolate solution into F START . B4 F B5 if (c) d9 d7 z = x+y B6 z=w*w B7 ... B8

  31. General idea for reaching definitions gen[R]: set of definitions in R for which there is a path from that definition to exit free of other definitions of that variable kill[R]: set of definitions in R that do not reach exit even if they reach entry of R Equation for region R: out[R] = gen[R] U (in[R] kill[R]) entry exit

  32. Elimination and interpolation

  33. Elimination-based methods Same idea can be used for other dataflow problems Compute gen and kill sets For structured programs Can solve dataflow equations without iteration Can compute dataflow solution from AST without building control-flow graph For unstructured programs Generalization of loop: interval Single-entry multiple-exit loop Reducible program: program that can be decomposed into nested intervals (Cocke and Allen) Irreducible program: needs iteration John Cocke (in wheelchair) and Fran Allen (pink shirt). Both won the Turing Award

  34. Summary Program analysis is needed to perform optimizations safely Analysis must consider all possible execution paths Meet over paths (MOP) solution Dataflow analysis Solve a set of fixpoint equations to find dataflow information Domain: usually a powerset ordered by containment Transfer function for assignment statements Confluence operator: usually either union or intersection Type of solution Confluence operator is union: least solution Confluence operator is intersection: greatest solution For many problems, dataflow solution gives MOP solution. For other problems like constant propagation, it gives a safe approximation to MOP. In practice Compute transfer function for entire basic blocks so you have as many unknowns as basic blocks rather than statements, and then iterate For structured programs, you can avoid iteration entirely: elimination- based methods Sets are represented using bit-vectors and union/intersection are implemented using bitwise OR/AND

Related