Context-Free Grammars in Systems Programming

 
 
CS252: Systems Programming
 
Ninghui Li
Topic 5: Parsing
Prepared by Evan Hanau
ehanau@purdue.edu
 
 
 
Introduction to Parsing with Yacc
 
An Introduction to Parsing with Yacc
 
Context-Free Grammars
Yacc Parsing
An example Infix Calculator Program
 
 
 
Context-Free Grammar
 
Background: The Context-Free Grammar
 
By CS252 you are already somewhat
familiar with Regular Expressions.
 
Regular expressions can be used to
describe regular languages, which belong
to a larger classification of language types.
 
 
 
In CS, we classify languages on the Chomsky
Hierarchy.
 
Type-0 Recursively Enumerable
Type-1 Context-Sensitive
Type-2 Context-Free
Type-3 Regular
 
 
Context-Free Grammar
 
Type-(i) is a superset of Type-(i+1)
 
 
 
Languages generated by regular
expressions belong to type 3.
Note: Your specific regular expression
engine (e.g. POSIX extended RE) is likely
capable of more complex productions.
In any case, we need more than regular
expressions to parse computer
programming languages and shell scripts.
 
 
Context-Free Grammar
 
 
 
You can do a great deal with regular
expressions.
 
Exercise: Create a regular expression that
matches on 
any
 English phrase that is a
palindrome, for instance the string “some
men interpret nine memos”.
 
 
 
Context-Free Grammar
 
 
 
This is in fact not possible with regex (by its strict CS
definition!). You would be limited to palindromes of
a finite length only.
RE cannot express “a
n
b
n
”, a string with some number of
a’s followed by equal number of b’s
The expression a*b* does not require number of a’s
equal that of b’s
We must use a 
context-free grammar
 to describe
palindromes and other constructs.
More powerful than a regular expression, and useful
when some notion of “what came before” is required.
 
 
 
 
Context-Free Grammar
 
 
 
BNF or Backus-Naur form is used in CS to
describe context-free grammars. It is often used
to describe the syntax of programming
languages.consists of one or more of the
following:
 
<nonterminal> ::= __expression__
 
Where __expression__ consists of one or more
terminals
 and 
nonterminals
 or nothing (epsilon).
 
 
 
Backus-Naur form
 
 
 
<postal-address> ::= <name-part> <street-address> <zip-part>
 
<name-part> ::= <personal-part> <last-name> <opt-suffix-part> <EOL>
             | <personal-part> <name-part>
 
<personal-part> ::= <first-name> | <initial> "."
 
<street-address> ::= <house-num> <street-name> <opt-apt-num> <EOL>
 
<zip-part> ::= <town-name> "," <state-code> <ZIP-code> <EOL>
 
<opt-suffix-part> ::= "Sr." | "Jr." | <roman-numeral> | ""
 
<opt-apt-num> ::= <apt-num> | ""
 
US Post Address in 
Backus-Naur
form (from wikipedia)
 
 
 
Let’s define a grammar for a primitive add or
multiply expression:
 
<expr> ::= <expr> * <expr>
        |  <expr> + <expr>
   
 |  number
 
In this case, <expr> is a terminal and number,
*, and + are the nonterminals.
 
 
 
 
 
Context-Free Grammar for
Simple Expressions
 
 
 
Clearly, there is some ambiguity here,
because operator precedence (sometimes
referred to as 
binding
) is not defined.
 
The grammar does not distinguish between
2+2*2+2 = 16 (incorrect under normal
rules) or 2+2*2+2 = 8 (correct).
 
 
 
 
 
 
 
Context-Free Grammar
 
 
 
One Solution: Define expressions of different
levels:
<expr> ::= <add_expr>
<add_expr> ::= <add_expr> + <mul_expr>
             | <mul_expr>
<mul_expr> ::= <mul_expr> * number
             | number
Now, multiplication will bind tighter than
addition (this may require a few sample
expressions to wrap your head around!)
 
 
 
 
Context-Free Grammar
 
 
 
Associativity follows from the above example
(Hint: What side of the multiply and add
operation did we have the “deeper”
production on?)
 
 
 
 
Context-Free Grammar
 
 
 
<palindrome> ::= letter
                        |   
          // empty string
<palindrome> ::= “a” <palindrome> “a”
                         |    “b” <palindrome> “b”
                         
….
Or, <palindrome> ::= letter <palindrome> letter
 
However, we need to check the two letters are the same.
CFG for anbn:
<anbn> = 
              // empty string
              | “a” <anbn> “b”
 
CFG for 
palindrome
 
 
 
Chomsky Hierarchy (From
Wikipedia Page)
 
 
 
Type-0 Recursively Enumerable
Type-1 Context-Sensitive
 
 
Cannot encode all strings r
1
r
2
 such that r
1
 and r
2
are two regular expressions that are equivalent
Type-2 Context-Free (Pushdown Automaton, i.e,
Finite State Automaton with a Stack)
 
Can encode a
n
b
n
, but not a
n
b
n
c
n
Type-3 Regular   (Finite State Automaton)
 
Can encode a*b*, but not a
n
b
n
 
 
Chomsky Hierarchy Revisited.
 
 
 
Why is Context-Free Grammar
Called Context Free?
 
In a CFG, the left hand of each production is a single
non-terminal, e.g.,
 
<palindrome> ::= “a” <palindrome> “a”
 
This means that “a”, followed by a <palindrome>, and by
“a” will always be considered as <palindrome>, no matter
what is the context, hence context free.
 
In a Context-Sensitive Grammar, left hand of production rules
can include other things
 
 
 
An Example Context-Sensitive
Grammar for 
a
n
b
n
c
n
 
1.    S  
 a B C
2.    S 

 a S B C
3. B C 

 C B
4. a B 
 a b
5. b B 
 b b
6. b C 
  b c
7. c C 
  c c
 
 
  
S
2
 
aSBC
1
 
a
aBC
BC
 
3
 
aaB
BCC
 
4
 
a
ab
B
CC
 
5
 
aa
bb
CC
 
6
 
aab
bc
C
 
7
 
aabb
cc
 
 
 
Yacc & Parsing
 
There are many ways to parse BNF grammars,
most of which are discussed in a compilers
course.
Recall: A finite state automaton (FSA) is
used for regular expressions. (CS182).
For a context-free grammar, we use a
pushdown automaton, which combines
features of a FSA with a stack.
 
 
 
 
 
 
Yacc generates what is known as a LALR
parser, which is generated from the BNF
grammar in your Yacc file. This parser is
defined in the C source file that Yacc
generates.
 
We use Lex to make a lexer to generate our
terminals, which are matched with regular
expressions before being fed into the parser.
 
 
 
 
Yacc & Parsing
 
 
 
 
 
 
 
Yacc is capable of generating a powerful
parser that will handle many different
grammars.
 
LEX
Lexer
YACC
Parser
Rule-
Based
Behavior
 
terminals
 
Input Characters
 
Yacc & Parsing
 
 
 
Recall that parsing combines a state machine with
a stack. States go on a stack to keep track of
where parsing is. Yacc uses a 
parse table
 which
defines possible states.
Yacc’s parser operates using two primary actions,
shift 
and 
reduce.
shift
 puts a state on the stack, reduce pops state(s)
off the stack and 
reduces
 combinations of
nonterminals and terminals to a single
nonterminal. After a reduction to a rule, Yacc’s
parser will optionally run some user-defined
code.
 
 
 
 
Yacc & Parsing
 
 
 
A 
very
 basic example:
 
<rule> := “hello” “world” “\n”
 
The parser would shift each word,
successively pushing each state 
(.”hello”,
.”world”, .”\n”
) onto the stack. Then at the
end of the rule, reduce everything to 
<rule>
and pop the three states.
 
 
 
 
Yacc & Parsing
 
 
 
A Lex/Yacc Infix Calculator
 
Yacc’s parser is powerful, but is not
capable of parsing all grammars.
 
Certain ambiguous grammars may produce
what is known as a
 shift/reduce
 or
reduce/reduce
 conflict. Yacc will, by
default, shift instead of reduce.
 
 
 
Consider the classic 
shift/reduce
 conflict
example:
 
<ifexp> ::= IF <expr> THEN <stmt> ELSE <stmt
>
       
|   IF <expr> THEN <stmt>
 
Yacc will have a shift/reduce conflict here, but
will go with shift (the top option) by default.
It’s greedy!
 
 
 
 
Yacc & Parsing
 
 
 
A Lex/Yacc Infix Calculator
 
To demonstrate the utility of Lex and Yacc
(or in our case, Flex and Bison) we provide
an example infix calculator.
 
Similar to several of the examples provided
on the Lex and Yacc manpage at
http://dinosaur.compilertools.net, but with
added features
 
 
 
A Lex/Yacc Infix Calculator
 
Make sure to read ALL source code
comments, particularly those that describe
source file organization.
 
Lex definition file: calculator.l
Yacc grammar file: calculator.y
AST Classes: ast.cc
Symbol table: symtab.cc
 
 
 
A Lex/Yacc Infix Calculator
 
The example calculator application uses Lex
and Yacc to parse mathematical expressions
and produce an Abstract Syntax Tree, which is
then used to evaluate those expressions.
 
It allows the =, +, *, -, +, ^, () and unary
minus operators, with appropriate levels of
binding and precedence. Examine calculator.y,
because it is heavily commented.
 
 
 
 
 
 
 
A Lex/Yacc Infix Calculator
 
The symbol table (implemented here in
simple O(n) access time) maps variables to
values.
 
Print the AST after every expression
evaluation by running calculator with the –t
flag, e.g. “calc –t”.
 
 
 
 
 
 
 
 
calc> 2*2^3/3
 = 5.333333
    3.000
  /
(/)
  \
            3.000
          /
        (^)
          \
            2.000
      /
    (*)
      \
        2.000
 
A calculator example. Type “2*2^3/3 and press enter:
 
A Lex/Yacc Infix Calculator
 
 
 
Review
 
What are required:
Able to write simple Context Free
Grammars, similar to those used in
implementing FIZ
Able to determine whether a string of
tokens is accepted by a grammar
Able to show how a string of tokens is
parsed into some non-terminal (i.e.,
draw the parsing tree)
 
 
Slide Note
Embed
Share

This insightful content delves into the world of context-free grammars, explaining their significance in parsing computer programming languages. It discusses the hierarchy of language classification, the limitations of regular expressions in expressing certain constructs like palindromes, and the utility of Backus-Naur Form (BNF) in describing context-free grammars. The included images and examples enhance comprehension of this fundamental topic in systems programming.

  • Context-Free Grammars
  • Parsing
  • Systems Programming
  • Backus-Naur Form
  • Language Classification

Uploaded on Oct 03, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. CS252: Systems Programming Ninghui Li Topic 5: Parsing Prepared by Evan Hanau ehanau@purdue.edu

  2. Introduction to Parsing with Yacc An Introduction to Parsing with Yacc Context-Free Grammars Yacc Parsing An example Infix Calculator Program

  3. Context-Free Grammar Background: The Context-Free Grammar By CS252 you are already somewhat familiar with Regular Expressions. Regular expressions can be used to describe regular languages, which belong to a larger classification of language types.

  4. Context-Free Grammar In CS, we classify languages on the Chomsky Hierarchy. Type-0 Recursively Enumerable Type-1 Context-Sensitive Type-2 Context-Free Type-3 Regular Type-(i) is a superset of Type-(i+1)

  5. Context-Free Grammar Languages generated by regular expressions belong to type 3. Note: Your specific regular expression engine (e.g. POSIX extended RE) is likely capable of more complex productions. In any case, we need more than regular expressions to parse computer programming languages and shell scripts.

  6. Context-Free Grammar You can do a great deal with regular expressions. Exercise: Create a regular expression that matches on any English phrase that is a palindrome, for instance the string some men interpret nine memos .

  7. Context-Free Grammar This is in fact not possible with regex (by its strict CS definition!). You would be limited to palindromes of a finite length only. RE cannot express anbn , a string with some number of a s followed by equal number of b s The expression a*b* does not require number of a s equal that of b s We must use a context-free grammar to describe palindromes and other constructs. More powerful than a regular expression, and useful when some notion of what came before is required.

  8. Backus-Naur form BNF or Backus-Naur form is used in CS to describe context-free grammars. It is often used to describe the syntax of programming languages.consists of one or more of the following: <nonterminal> ::= __expression__ Where __expression__ consists of one or more terminals and nonterminals or nothing (epsilon).

  9. US Post Address in Backus-Naur form (from wikipedia) <postal-address> ::= <name-part> <street-address> <zip-part> <name-part> ::= <personal-part> <last-name> <opt-suffix-part> <EOL> | <personal-part> <name-part> <personal-part> ::= <first-name> | <initial> "." <street-address> ::= <house-num> <street-name> <opt-apt-num> <EOL> <zip-part> ::= <town-name> "," <state-code> <ZIP-code> <EOL> <opt-suffix-part> ::= "Sr." | "Jr." | <roman-numeral> | "" <opt-apt-num> ::= <apt-num> | ""

  10. Context-Free Grammar for Simple Expressions Let s define a grammar for a primitive add or multiply expression: <expr> ::= <expr> * <expr> | <expr> + <expr> | number In this case, <expr> is a terminal and number, *, and + are the nonterminals.

  11. Context-Free Grammar Clearly, there is some ambiguity here, because operator precedence (sometimes referred to as binding) is not defined. The grammar does not distinguish between 2+2*2+2 = 16 (incorrect under normal rules) or 2+2*2+2 = 8 (correct).

  12. Context-Free Grammar One Solution: Define expressions of different levels: <expr> ::= <add_expr> <add_expr> ::= <add_expr> + <mul_expr> | <mul_expr> <mul_expr> ::= <mul_expr> * number | number Now, multiplication will bind tighter than addition (this may require a few sample expressions to wrap your head around!)

  13. Context-Free Grammar Associativity follows from the above example (Hint: What side of the multiply and add operation did we have the deeper production on?)

  14. CFG for palindrome <palindrome> ::= letter | // empty string <palindrome> ::= a <palindrome> a | b <palindrome> b . Or, <palindrome> ::= letter <palindrome> letter However, we need to check the two letters are the same. CFG for anbn: <anbn> = // empty string | a <anbn> b

  15. Chomsky Hierarchy (From Wikipedia Page) Production rules (constraints) Grammar Languages Automaton Recursively enumerable Type-0 Turing machine (no restrictions) Linear-bounded non-deterministic Turing machine Non- deterministic push down automaton (equivalently, Right side no shorter than left) Type-1 Context-sensitive Type-2 Context-free Finite state automaton Type-3 Regular and

  16. Chomsky Hierarchy Revisited. Type-0 Recursively Enumerable Type-1 Context-Sensitive Cannot encode all strings r1r2 such that r1 and r2 are two regular expressions that are equivalent Type-2 Context-Free (Pushdown Automaton, i.e, Finite State Automaton with a Stack) Can encode anbn, but not anbncn Type-3 Regular (Finite State Automaton) Can encode a*b*, but not anbn

  17. Why is Context-Free Grammar Called Context Free? In a CFG, the left hand of each production is a single non-terminal, e.g., <palindrome> ::= a <palindrome> a This means that a , followed by a <palindrome>, and by a will always be considered as <palindrome>, no matter what is the context, hence context free. In a Context-Sensitive Grammar, left hand of production rules can include other things

  18. An Example Context-Sensitive Grammar for anbncn 1. S a B C 2. S a S B C 3. B C C B 4. a B a b 5. b B b b 6. b C b c 7. c C c c S 2aSBC 1aaBCBC 3aaBBCC 4aabBCC 5aabbCC 6aabbcC 7aabbcc

  19. Yacc & Parsing There are many ways to parse BNF grammars, most of which are discussed in a compilers course. Recall: A finite state automaton (FSA) is used for regular expressions. (CS182). For a context-free grammar, we use a pushdown automaton, which combines features of a FSA with a stack.

  20. Yacc & Parsing Yacc generates what is known as a LALR parser, which is generated from the BNF grammar in your Yacc file. This parser is defined in the C source file that Yacc generates. We use Lex to make a lexer to generate our terminals, which are matched with regular expressions before being fed into the parser.

  21. Yacc & Parsing Input Characters Rule- Based Behavior terminals LEX Lexer YACC Parser Yacc is capable of generating a powerful parser that will handle many different grammars.

  22. Yacc & Parsing Recall that parsing combines a state machine with a stack. States go on a stack to keep track of where parsing is. Yacc uses a parse table which defines possible states. Yacc s parser operates using two primary actions, shift and reduce. shift puts a state on the stack, reduce pops state(s) off the stack and reduces combinations of nonterminals and terminals to a single nonterminal. After a reduction to a rule, Yacc s parser will optionally run some user-defined code.

  23. Yacc & Parsing A very basic example: <rule> := hello world \n The parser would shift each word, successively pushing each state (. hello , . world , . \n ) onto the stack. Then at the end of the rule, reduce everything to <rule> and pop the three states.

  24. A Lex/Yacc Infix Calculator Yacc s parser is powerful, but is not capable of parsing all grammars. Certain ambiguous grammars may produce what is known as a shift/reduce or reduce/reduce conflict. Yacc will, by default, shift instead of reduce.

  25. Yacc & Parsing Consider the classic shift/reduce conflict example: <ifexp> ::= IF <expr> THEN <stmt> ELSE <stmt> | IF <expr> THEN <stmt> Yacc will have a shift/reduce conflict here, but will go with shift (the top option) by default. It s greedy!

  26. A Lex/Yacc Infix Calculator To demonstrate the utility of Lex and Yacc (or in our case, Flex and Bison) we provide an example infix calculator. Similar to several of the examples provided on the Lex and Yacc manpage at http://dinosaur.compilertools.net, but with added features

  27. A Lex/Yacc Infix Calculator Make sure to read ALL source code comments, particularly those that describe source file organization. Lex definition file: calculator.l Yacc grammar file: calculator.y AST Classes: ast.cc Symbol table: symtab.cc

  28. A Lex/Yacc Infix Calculator The example calculator application uses Lex and Yacc to parse mathematical expressions and produce an Abstract Syntax Tree, which is then used to evaluate those expressions. It allows the =, +, *, -, +, ^, () and unary minus operators, with appropriate levels of binding and precedence. Examine calculator.y, because it is heavily commented.

  29. A Lex/Yacc Infix Calculator The symbol table (implemented here in simple O(n) access time) maps variables to values. Print the AST after every expression evaluation by running calculator with the t flag, e.g. calc t .

  30. A Lex/Yacc Infix Calculator A calculator example. Type 2*2^3/3 and press enter: calc> 2*2^3/3 = 5.333333 3.000 / (/) \ 3.000 / (^) \ 2.000 / (*) \ 2.000

  31. Review What are required: Able to write simple Context Free Grammars, similar to those used in implementing FIZ Able to determine whether a string of tokens is accepted by a grammar Able to show how a string of tokens is parsed into some non-terminal (i.e., draw the parsing tree)

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#