Java CUP Parser Generators

 
Java CUP
 
 
1
 
Last Time
 
What do we want?
An AST
When do we want it?
Now!
 
2
 
This Time
 
A little review of ASTs
The philosophy and use of a 
Parser Generator
 
3
Translating Lists
4
CFG
IdList
 -> 
id
        | 
IdList
 
comma
 
id
 
AST
IdNode
“x”
 
Input
x, y, z
IdNode
“y”
IdNode
“z”
IdList
IdList
IdList
id
“x”
,
,
id
“y”
id
“z”
Parser Generators
 
Tools that take an SDT spec and build an AST
YACC: Yet Another Compiler Compiler
Java CUP: Constructor of Useful Parsers
Conceptually similar to JLex
Input: Language rules + actions
Output: Java code
5
Java CUP
 
Parser spec
(xxx.cup)
 
Parser Source
(parser.java)
 
Symbols
(sym.java)
Java CUP
Parser.java
Constructor takes arg of type Scanner (i.e., yylex)
Contains a parsing method
return: Symbol whose value contains
translation of root nonterminal
Uses output of JLex
Depends on scanner and TokenVals
sym.java defines the communication
    language
Uses defs of AST classes
Also in xxx.cup
6
Java CUP
Parser spec
(xxx.cup)
Parser Source
(parser.java)
Symbols
(sym.java)
Defines the token
names used by both
JLex and Java CUP
Java CUP Input Spec
 
Terminal & nonterminal
declarations
Optional precedence
and associativity
declarations
Grammar with rules
and actions [no actions
shown here]
7
 
Grammar rules
Expr ::= 
intliteral
      |  
id
      
|  Expr 
plus
 Expr
      |  Expr 
times
 Expr
      |  
lparens
 Expr 
rparens
 
Terminal and Nonterminals
terminal 
intliteral;
terminal 
id;
terminal 
plus;
terminal
 minus;
terminal 
times;
terminal 
lparen
;
terminal 
rparen
;
non terminal Expr;
 
Precedence and Associativity
precedence left plus, minus;
precedence left times;
prededence nonassoc less;
 
lowest
precedence
first
 
Java CUP Example
 
Assume ExpNode subclasses
PlusNode
, 
TimesNode
 have
2 children for operands
IdNode
 has a String field
IntLitNode
 has an int field
Assume Token classes
IntLitTokenVal
 with field
intVal for the value of the
integer literal
IdTokenVal
 with field idVal
for the actual identifier
 
8
 
Step 1:
 Add types to terminals
 
terminal IntLitTokenVal intliteral;
terminal IdTokenVal id;
terminal plus;
terminal times;
terminal lparen;
terminal rparen;
 
non terminal ExpNode expr;
 
Java CUP Example
 
9
 
Expr ::= intliteral
         {:
 
  :}
      |  id
         {:
 
         :}
      |  Expr plus Expr
         {:
 
         :}
      |  Expr times Expr
         {:
 
         :}
      |  lparen Expr rparen
         {:
 
         :}
      ;
 
Java CUP Example
 
10
 
Expr ::= intliteral:i
         {:
            RESULT = new IntLitNode(i.intVal);
 
  :}
      |  id
         {:
 
         :}
      |  Expr plus Expr
         {:
 
         :}
      |  Expr times Expr
         {:
 
         :}
      |  lparen Expr rparen
         {:
 
         :}
      ;
 
Java CUP Example
 
11
 
Expr ::= intliteral:i
         {:
            RESULT = new IntLitNode(i.intVal);
 
  :}
      |  id:i
         {:
  
RESULT = new IdNode(i.idVal);
         :}
      |  Expr:e1 plus Expr:e2
         {:
  
RESULT = new PlusNode(e1,e2);
         :}
      |  Expr:e1 times Expr:e2
         {:
  
RESULT = new TimesNode(e1,e2);
         :}
      |  lparen Expr:e rparen
         {:
  
RESULT = e;
         :}
      ;
 
Java CUP Example
 
12
 
Input: 2 +     3
Expr
Expr
Expr
plus
intliteral
intliteral
IntLitTokenVal
linenum:  …
charnum: …
intVal:
IntLitTokenVal
linenum:  …
charnum: …
intVal:
IntLitNode
val:
IntLitNode
val:
PlusNode
left:
right:
2
3
2
3
 
Purple = Terminal Token (Built by Scanner)
Blue = Symbol (Built by Parser)
 
Handling Lists in Java CUP
 
13
 
stmtList ::= stmtList:sl stmt:s
         {:  sl.addToEnd(s);
             RESULT = sl;
         :}
         |   /* epsilon */
         {:  RESULT = new Sequence();
         :}
         ;
 
Another issue: left-recursion (as above) or right-recursion?
For top-down parsers, must use right-recursion
Left-recursion causes an infinite loop
With Java CUP, 
use left-recursion
!
Java CUP is a bottom-up parser (LALR(1))
Left-recursion allows a bottom-up parser to recognize a list s1, s2, s3, s4
     with no extra stack space:
 
recognize instance of “stmtList ::= epsilon” (current nonterminal stmtList)
 
recognize instance of “stmtList ::= stmtList:current stmt:s1”         [s1]
          recognize instance of “stmtList ::= stmtList:current stmt:s2”         [s1, s2]
          recognize instance of “stmtList ::= stmtList:current stmt:s3”         [s1, s2, s3]
          recognize instance of “stmtList ::= stmtList:current stmt:s4”         [s1, s2, s3, s4]
Handling Unary Minus
/* precedences and associativities of operators */
precedence left PLUS, MINUS;
precedence left TIMES, DIVIDE;
precedence nonassoc UMINUS;     // Also used for precedence of unary minus
exp  ::=  . . .
     |   
MINUS
 exp:e
     {:  RESULT = new UnaryMinusNode(e);
     :}    %prec 
UMINUS
  /* artificially elevate the precedence to that of UMINUS */
     |   exp:e1 
PLUS
 exp:e2
     {:  RESULT = new PlusNode(e1, e2);
     :}
     |   exp:e1 
MINUS
 exp:e2
     {:  RESULT = new MinusNode(e1, e2);
     . . .
     ;
14
The precedence of a rule is that of the 
last
token of the rule, unless assigned a specific
precedence via “%prec <TOKEN>”
UMINUS is a phony token never returned by
the scanner.  UMINUS is solely for the
purpose of being used in “%prec UMINUS”
 
Java CUP Demo
 
15
 
 
Slide Note
Embed
Share

Explore the world of Java CUP parser generators, focusing on creating Abstract Syntax Trees (ASTs), translating lists in context-free grammar, and the use of parser tools like YACC and Java CUP. Dive into creating grammars, token classes, and AST classes to build parsers effectively.

  • Java CUP
  • Parser Generators
  • Abstract Syntax Trees
  • Grammar Rules
  • YACC

Uploaded on Sep 19, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Java CUP 1

  2. Last Time What do we want? An AST When do we want it? Now! 2

  3. This Time A little review of ASTs The philosophy and use of a Parser Generator 3

  4. Translating Lists CFG IdList -> id | IdListcommaid Input x, y, z IdList AST IdNode z IdNode x IdNode y IdList , id z IdList , id y id x 4

  5. Parser Generators Tools that take an SDT spec and build an AST YACC: Yet Another Compiler Compiler Java CUP: Constructor of Useful Parsers Conceptually similar to JLex Input: Language rules + actions Output: Java code Parser spec (xxx.cup) Java CUP Parser Source (parser.java) Symbols (sym.java) 5

  6. Java CUP Parser.java Constructor takes arg of type Scanner (i.e., yylex) Contains a parsing method return: Symbol whose value contains translation of root nonterminal Uses output of JLex Depends on scanner and TokenVals sym.java defines the communication language Uses defs of AST classes Also in xxx.cup Parser spec (xxx.cup) Java CUP Parser Source (parser.java) Symbols (sym.java) Defines the token names used by both JLex and Java CUP 6

  7. Java CUP Input Spec Grammar rules Expr ::= intliteral | id | Expr plus Expr | Expr times Expr | lparens Expr rparens Terminal and Nonterminals terminal intliteral; terminal id; terminal plus; terminal minus; terminal times; terminal lparen; terminal rparen; non terminal Expr; Terminal & nonterminal declarations Optional precedence and associativity declarations Grammar with rules and actions [no actions shown here] lowest precedence first Precedence and Associativity precedence left plus, minus; precedence left times; prededence nonassoc less; 7

  8. Java CUP Example Assume ExpNode subclasses PlusNode, TimesNode have 2 children for operands IdNode has a String field IntLitNode has an int field Assume Token classes IntLitTokenVal with field intVal for the value of the integer literal IdTokenVal with field idVal for the actual identifier Step 1: Add types to terminals terminal IntLitTokenVal intliteral; terminal IdTokenVal id; terminal plus; terminal times; terminal lparen; terminal rparen; non terminal ExpNode expr; 8

  9. Java CUP Example Expr ::= intliteral {: :} | id {: :} | Expr plus Expr {: :} | Expr times Expr {: :} | lparen Expr rparen {: :} ; 9

  10. Java CUP Example Expr ::= intliteral:i {: RESULT = new IntLitNode(i.intVal); :} | id {: :} | Expr plus Expr {: :} | Expr times Expr {: :} | lparen Expr rparen {: :} ; 10

  11. Java CUP Example Expr ::= intliteral:i {: RESULT = new IntLitNode(i.intVal); :} | id:i {: RESULT = new IdNode(i.idVal); :} | Expr:e1 plus Expr:e2 {: RESULT = new PlusNode(e1,e2); :} | Expr:e1 times Expr:e2 {: RESULT = new TimesNode(e1,e2); :} | lparen Expr:e rparen {: RESULT = e; :} ; 11

  12. Java CUP Example PlusNode left: right: Input: 2 + 3 Expr IntLitNode val: 2 IntLitNode val: Expr plus Expr 3 intliteral intliteral IntLitTokenVal linenum: charnum: intVal: IntLitTokenVal linenum: charnum: intVal: 2 3 Purple = Terminal Token (Built by Scanner) Blue = Symbol (Built by Parser) 12

  13. Handling Lists in Java CUP stmtList ::= stmtList:sl stmt:s {: sl.addToEnd(s); RESULT = sl; :} | /* epsilon */ {: RESULT = new Sequence(); :} ; Another issue: left-recursion (as above) or right-recursion? For top-down parsers, must use right-recursion Left-recursion causes an infinite loop With Java CUP, use left-recursion! Java CUP is a bottom-up parser (LALR(1)) Left-recursion allows a bottom-up parser to recognize a list s1, s2, s3, s4 with no extra stack space: recognize instance of stmtList ::= epsilon (current nonterminal stmtList) recognize instance of stmtList ::= stmtList:current stmt:s1 [s1] recognize instance of stmtList ::= stmtList:current stmt:s2 [s1, s2] recognize instance of stmtList ::= stmtList:current stmt:s3 [s1, s2, s3] recognize instance of stmtList ::= stmtList:current stmt:s4 [s1, s2, s3, s4] 13

  14. UMINUS is a phony token never returned by Handling Unary Minus the scanner. UMINUS is solely for the purpose of being used in %prec UMINUS /* precedences and associativities of operators */ precedence left PLUS, MINUS; precedence left TIMES, DIVIDE; precedence nonassoc UMINUS; // Also used for precedence of unary minus token of the rule, unless assigned a specific The precedence of a rule is that of the last precedence via %prec <TOKEN> exp ::= . . . | MINUS exp:e {: RESULT = new UnaryMinusNode(e); :} %prec UMINUS /* artificially elevate the precedence to that of UMINUS */ | exp:e1 PLUS exp:e2 {: RESULT = new PlusNode(e1, e2); :} | exp:e1 MINUS exp:e2 {: RESULT = new MinusNode(e1, e2); . . . ; 14

  15. Java CUP Demo 15

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#