Syntactic Analysis in Computer Science

csc 453 syntax analysis parsing n.w
1 / 69
Embed
Share

Dive into the world of syntax analysis (parsing) in computer science, exploring token sequences verification, symbol table management, context-free grammars, terminologies, and example derivations in the context of programming languages. Understand the essential concepts and components involved in ensuring syntactically correct programs.

  • Syntax Analysis
  • Computer Science
  • Parsing
  • Programming Languages

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. CSc 453 Syntax Analysis (Parsing) Saumya Debray The University of Arizona Tucson

  2. Overview tokens lexical analyzer (scanner) syntax analyzer (parser) source program syntax tree symbol table manager Main Task: Take a token sequence from the scanner and verify that it is a syntactically correct program. Secondary Tasks: Process declarations and set up symbol table information accordingly, in preparation for semantic analysis. Construct a syntax tree in preparation for intermediate code generation. CSc 453: Syntax Analysis 2

  3. Context-free Grammars A context-free grammar for a language specifies the syntactic structure of programs in that language. Components of a grammar: a finite set of tokens (obtained from the scanner); a set of variables representing related sets of strings, e.g., declarations, statements, expressions. a set of rules that show the structure of these strings. an indication of the top-level set of strings we care about. CSc 453: Syntax Analysis 3

  4. Context-free Grammars: Definition Formally, a context-free grammar G is a 4-tuple G = (V, T, P, S), where: V is a finite set of variables (or nonterminals). These describe sets of related strings. T is a finite set of terminals (i.e., tokens). P is a finite set of productions, each of the form A where A V is a variable, and (V T)* is a sequence of terminals and nonterminals. S V is the start symbol. CSc 453: Syntax Analysis 4

  5. Context-free Grammars: An Example A grammar for palindromic bit-strings: G = (V, T, P, S), where: V = { S, B } T = {0, 1} P = { S B, S , S 0 S 0, S 1 S 1, B 0, B 1 } CSc 453: Syntax Analysis 5

  6. Context-free Grammars: Terminology Derivation: Suppose that and are strings of grammar symbols, and A is a production. Then, A ( A derives ). : derives in one step * : derives in 0 or more steps * (0 steps) * if and * ( 1 steps) CSc 453: Syntax Analysis 6

  7. Derivations: Example Grammar for palindromes: G = (V, T, P, S), V = {S}, T = {0, 1}, P = { S 0 S 0 | 1 S 1 | 0 | 1 | }. A derivation of the string 10101: S 1 S 1 (using S 1S1) 1 0S0 1 (using S 0S0) 10101 (using S 1) CSc 453: Syntax Analysis 7

  8. Leftmost and Rightmost Derivations A leftmost derivation is one where, at each step, the leftmost nonterminal is replaced. (analogous for rightmost derivation) Example: a grammar for arithmetic expressions: E E + E | E * E | id Leftmost derivation: E E * E E + E * E id + E * E id + id * E id + id * id Rightmost derivation: E E + E E + E * E E + E * id E + id * id id + id * id CSc 453: Syntax Analysis 8

  9. Context-free Grammars: Terminology The language of a grammar G = (V,T,P,S) is L(G) = { w | w T* and S * w }. The language of a grammar contains only strings of terminal symbols. Two grammars G1 and G2 are equivalent if L(G1) = L(G2). CSc 453: Syntax Analysis 9

  10. Parse Trees A parse tree is a tree representation of a derivation. Constructing a parse tree: The root is the start symbol S of the grammar. Given a parse tree for X , if the next derivation step is X 1 n then the parse tree is obtained as: CSc 453: Syntax Analysis 10

  11. Exercise: Derivations and Parse Trees Grammar: G = (V, T, P, S) V = { A, B, C} T = {0, 1} P: A B C B 0 B 0 B 1 C 1 C C 0 S A Strings: 00100110 Problem: Leftmost derivation Rightmost derivation Parse trees 1. 2. 3. CSc 453: Syntax Analysis 11

  12. Approaches to Parsing Top-down parsing: attempts to figure out the derivation for the input string, starting from the start symbol. Bottom-up parsing: starting with the input string, attempts to derive in reverse and end up with the start symbol; forms the basis for parsers obtained from parser-generator tools such as yacc, bison. CSc 453: Syntax Analysis 12

  13. Top-down Parsing top-down: starting with the start symbol of the grammar, try to derive the input string. Parsing process: use the current state of the parser, and the next input token, to guide the derivation process. Implementation: use a finite state automaton augmented with a runtime stack ( pushdown automaton ). CSc 453: Syntax Analysis 13

  14. Bottom-up Parsing bottom-up: work backwards from the input string to obtain a derivation for it. Parsing process: use the parser state to keep track of: what has been seen so far, and given this, what the rest of the input might look like. Implementation: use a finite state automaton augmented with a runtime stack ( pushdown automaton ). CSc 453: Syntax Analysis 14

  15. Parsing: Top-down vs. Bottom-up S top-down parsing: starting at the start symbol, infers a derivation that gives the input string bottom-up parsing: starting from the input string, infers a derivation-in-reverse that gives the start symbol parse tree input string CSc 453: Syntax Analysis 15

  16. Parsing Problems: Ambiguity A grammar G is ambiguous if some string in L(G) has more than one parse tree. Equivalently: if some string in L(G) has more than one leftmost (rightmost) derivation. Example: The grammar E E + E | E * E | id is ambiguous, since id+id*id has multiple parses: CSc 453: Syntax Analysis 16

  17. Exercise: Ambiguity Which (if any) of these grammars are ambiguous? X X X X ( X ) X X A B A 0 A 1 | 1 B 1 B 0 | 0 CSc 453: Syntax Analysis 17

  18. Dealing with Ambiguity Transform the grammar to an equivalent unambiguous grammar. Augment ambiguous grammar with disambiguating rules to specify which parse to use. e.g.: %left , %right annotations in yacc Comment: It is not possible to determine algorithmically whether: Two given CFGs are equivalent; A given CFG is ambiguous. 1. 2. CSc 453: Syntax Analysis 18

  19. Bottom-up parsing: Approach (Implementation) Preprocess the grammar to compute some info about it. (FIRST and FOLLOW sets) Use this info to construct a pushdown automaton for the grammar: the automaton uses a table ( parsing table ) to guide its actions; constructing a parser amounts to constructing this table. 1. 2. CSc 453: Syntax Analysis 19

  20. Bottom-up parsing: Approach (Discussion) Background general structure of bottom-up parsers: shift-reduce parsers LR parsers Construction of LR parsing automata Construction of LR parse tables FIRST and FOLLOW sets Putting it all together 1. 2. 3. 4. CSc 453: Syntax Analysis 20

  21. GENERAL BACKGROUND CSc 453: Syntax Analysis 21

  22. Shift-reduce Parsing An instance of bottom-up parsing Basic idea: repeat in the string being processed, find a substring such that A is a production; replace the substring by A (i.e., reverse a derivation step). until we get the start symbol. Technical issues: Figuring out which substring to replace; and which production to reduce with. 1. 2. 1. 2. CSc 453: Syntax Analysis 22

  23. Shift-reduce Parsing: Example Grammar: S aABe A Abc | b B d Input: abbcde (using A b) aAbcde (using A Abc) aAde (using B d) aABe (using S aABe) S CSc 453: Syntax Analysis 23

  24. Shift-Reduce Parsing: contd Need to choose reductions carefully: abbcde aAbcde aAAcBe doesn t work. A handle of a string s is a substring s.t.: matches the RHS of a rule A ; and replacing by A (the LHS of the rule) represents a step in the reverse of a rightmost derivation of s. For shift-reduce parsing, reduce only handles. CSc 453: Syntax Analysis 24

  25. Shift-reduce Parsing: Implementation Data Structures: a stack, its bottom marked by $ . Initially empty. the input string, its right end marked by $ . Initially w. Actions: repeat Shift some ( 0) symbols from the input string onto the stack, until a handle appears on top of the stack. Reduce to the LHS of the appropriate production. until ready to accept. Acceptance: when input is empty and stack contains only the start symbol. 1. 2. CSc 453: Syntax Analysis 25

  26. Example Stack( ) $ $a $ab $aA $aAb $aAbc $aA $aAd $aAB $aABe $S Input abbcde$ bbcde$ bcde$ bcde$ cde$ Action shift shift reduce: A b shift shift reduce: A Abc shift reduce: B d shift reduce: S aABe accept Grammar : S aABe A Abc | b B d de$ de$ e$ e$ $ $ CSc 453: Syntax Analysis 26

  27. Conflicts Can t decide whether to shift or to reduce both seem OK ( shift-reduce conflict ). Example: S if E then S | if E then S else S | Can t decide which production to reduce with several may fit ( reduce-reduce conflict ). Example: Stmt id ( args ) | Expr Expr id ( args ) CSc 453: Syntax Analysis 27

  28. LR Parsing A kind of shift-reduce parsing. An LR(k) parser: scans the input L-to-R; produces a Rightmost derivation (in reverse); and uses k tokens of lookahead. Advantages: very general and flexible, and handles a wide class of grammars; efficiently implementable. Disadvantages: difficult to implement by hand (use tools such as yacc or bison). CSc 453: Syntax Analysis 28

  29. LR PARSING: HOW THE PARSER WORKS CSc 453: Syntax Analysis 29

  30. LR Parsing: Schematic The driver program is the same for all LR parsers (SLR(1), LALR(1), LR(1), ). Only the parse table changes. Different LR parsing algorithms involve different tradeoffs between parsing power, parse table size. CSc 453: Syntax Analysis 30

  31. LR Parsing: the parser stack The parser stack holds strings of the form s0 X1s1 X2s2 Xmsm (smis on top) where si are parser states and Xi are grammar symbols. (Note: the Xi and si always come in pairs, with the state component si on top.) A parser configuration is a pair stack contents, unexpended input CSc 453: Syntax Analysis 31

  32. LR Parsing: Roadmap LR parsing algorithm: parse table structure parsing actions Parse table construction: viable prefix automaton parse table construction from this automaton improving parsing power: different LR parsing algorithms CSc 453: Syntax Analysis 32

  33. LR Parse Tables The parse table has two parts: the action function and the goto function. At each point, the parser s next move is given by action[sm, ai], where: sm is the state on top of the parser stack, and ai the next input token. The goto function is used only during reduce moves. CSc 453: Syntax Analysis 33

  34. LR Parser Actions: shift Suppose: the parser configuration is s0 X1s1 Xmsm, ai an , and action[sm, ai] = shiftsn . Effects of shift move: push the next input symbol ai; and push the state sn 1. 2. New configuration: s0 X1s1 Xmsm ai sn, ai+1 an CSc 453: Syntax Analysis 34

  35. LR Parser Actions: reduce Suppose: the parser configuration is s0 X1s1 Xmsm, ai an , and action[sm, ai] = reduceA . Effects of reduce move: pop n states and n grammar symbols off the stack (2n symbols total), where n = | |. suppose the (newly uncovered) state on top of the stack is t, and goto[t, A] = u. push A, then u. New configuration: s0 X1s1 Xm-nsm-n Au, ai an 1. 2. 3. CSc 453: Syntax Analysis 35

  36. LR Parsing Algorithm setipto the start of the input string w$. while TRUE do: let s = state on top of parser stack, a = input symbol pointed at by ip. if action[s,a] == shiftt then: (i) push the input symbol a on the stack, then the state t; (ii) advance ip. if action[s,a] == reduceA then: (i) pop 2*| | symbols off the stack; (ii) suppose t is the state that now gets uncovered on the stack; (iii) push the LHS grammar symbol A and the state u = goto[A, t]. if action[s,a] == accept then accept; else signal a syntax error. 1. 2. 1. 2. 3. 4. 5. CSc 453: Syntax Analysis 36

  37. LR PARSING AUTOMATA CSc 453: Syntax Analysis 37

  38. LR parsing: Viable Prefixes Goal: to be able to identify handles, and so produce a rightmost derivation in reverse. Given a configuration s0 X1s1 Xmsm, ai an : X1 X2 Xm ai anis obtainable on a rightmost derivation. X1 X2 Xm is called a viable prefix. The set of viable prefixes of a grammar are recognizable using a finite automaton. This automaton is used to recognize handles. CSc 453: Syntax Analysis 38

  39. Viable Prefix Automata An LR(0) item of a grammar G is a production of G with a dot somewhere in the RHS. Example: The rule A aAb gives these LR(0) items: A aAb A a Ab A aA b A aAb Intuition: A denotes that: we ve seen something derivable from ; and it would be legal to see something derivable from at this point. CSc 453: Syntax Analysis 39

  40. Overall Approach Given a grammar G with start symbol S: Construct the augmented grammar by adding a new start symbol S and a new production S S. Construct a finite state automaton whose start state is labeled by the LR(0) item S S. Use this automaton to construct the parsing table. CSc 453: Syntax Analysis 40

  41. Viable Prefix NFA for LR(0) items Each state is labeled by an LR(0) item. The initial state is labeled S S. Transitions: 1. where X is a terminal or nonterminal. 2. where X is a nonterminal, and X is a production. CSc 453: Syntax Analysis 41

  42. Viable Prefix NFA: Example Grammar : S 0 S 1 S CSc 453: Syntax Analysis 42

  43. Viable Prefix NFA DFA Given a set of LR(0) items I, the set closure(I) is constructed as follows: repeat add every item in I to closure(I); if A B closure(I) and B is a nonterminal, then for each production B , add the item B to closure(I). until no new items can be added to closure(I). 1. 2. Intuition: A B closure(I) means something derivable from B is legal at this point. This means that something derivable from B (and thus ) is also legal. CSc 453: Syntax Analysis 43

  44. Viable Prefix NFA DFA (cont d) Given a set of LR(0) items I, the set goto(I,X) is defined as goto(I, X) = closure({ A X | A X I }) Intuition: if A X I then (a) we ve seen something derivable from ; and (b) something derivable from X would be legal at this point. Suppose we now see something derivable from X. The parser should go to a state where (a) we ve seen something derivable from X; and (b) something derivable from would be legal. CSc 453: Syntax Analysis 44

  45. Example Let I0= {S S}. I1 = closure(I0) = { S S, /* from I0 */ S 0 S 1, S } goto (I1, 0) = closure( { S 0 S 1 } ) = {S 0 S 1, S 0 S 1, S } CSc 453: Syntax Analysis 45

  46. Viable Prefix DFA for LR(0) Items Given a grammar G with start symbol S, construct the augmented grammar with new start symbol S and new production S S. C = { closure({ S S }) }; // C = a set of sets of items = set of parser states repeat : for each set of items I C : for each grammar symbol X : if ( goto(I,X) && goto(I,X) C ) : // new state add goto(I,X) to C; } until no change to C; returnC. The set Cgives the viable prefix automaton s states; the goto() function gives the transitions between states. 1. 2. 3. 4. CSc 453: Syntax Analysis 46

  47. PARSE TABLE CONSTRUCTION CSc 453: Syntax Analysis 47

  48. SLR(1) Parse Table Construction I Given a grammar G with start symbol S: Construct the augmented grammar G with start symbol S . Construct the set of states {I0, I1, , In} for the Viable Prefix DFA for the augmented grammar G . Each DFA state Ii corresponds to a parser state si. The initial parser state s0 coresponds to the DFA state I0 obtained from the item S S. The parser actions in state si are defined by the items in the DFA state Ii. CSc 453: Syntax Analysis 48

  49. Figuring out parsing actions I Suppose a DFA state I contains an item A a where a is a terminal. In state I, we re looking to see an a in the input. If we see an a in the input, we should shift. A shift action needs a state s to be pushed this state is computed from the DFA s transitions: s = goto(I, a) CSc 453: Syntax Analysis 49

  50. Figuring out parsing actions II When should we reduce? the parser state should have an item of the form A Need to be careful about what terminals (lookahead tokens) we reduce on too generous with reduce actions conflicts SLR(1) parsers: reduce with A only if the next token can legitimately follow A. need to figure out which tokens can legitimately follow A. given by FIRST and FOLLOW sets CSc 453: Syntax Analysis 50

More Related Content