Compiler Structure and Syntax Analysis

B
a
s
i
c
 
P
r
o
g
r
a
m
 
A
n
a
l
y
s
i
s
:
 
A
S
T
CS 4501
Baishakhi Ray
C
o
m
p
i
l
e
r
 
O
v
e
r
v
i
e
w
Abstract Syntax Tree : 
Source code parsed to produce AST
Control Flow Graph: 
AST is transformed to CFG
Data Flow Analysis: 
operates on CFG
T
h
e
 
S
t
r
u
c
t
u
r
e
 
o
f
 
a
 
C
o
m
p
i
l
e
r
Adopted From UC Berkeley: Prof. Bodik  CS 164  Lecture 5
3
scanner
parser
checker
code gen
Source code (stream of characters)
stream of tokens
Abstract Syntax Tree (AST) 
AST with annotations (types, declarations)
Machine/byte code
S
y
n
t
a
c
t
i
c
 
A
n
a
l
y
s
i
s
 
Input: 
sequence of tokens from scanner
Output: 
abstract syntax tree
Actually,
parser first builds a 
parse tree
AST is then built by translating the parse tree
parse tree rarely built explicitly; only determined by, say, how parser pushes
stuff to stack
Adopted From UC Berkeley: Prof. Bodik  CS 164  Lecture 5
4
E
x
a
m
p
l
e
Source Code
   
4*(2+3)
Parser input
NUM(4)  TIMES  LPAR  NUM(2)  PLUS  NUM(3)  RPAR
Parser output (AST):
Adopted From UC Berkeley: Prof. Bodik  CS 164  Lecture 5
5
P
a
r
s
e
 
t
r
e
e
 
f
o
r
 
t
h
e
 
e
x
a
m
p
l
e
:
 
4
*
(
2
+
3
)
Adopted From UC Berkeley: Prof. Bodik  CS 164  Lecture 5
6
leaves are tokens
NUM(4)  TIMES  LPAR  NUM(2)  PLUS  NUM(3)  RPAR
A
n
o
t
h
e
r
 
e
x
a
m
p
l
e
 
Source Code
if (x == y) { a=1; }
Parser input
IF  LPAR  ID  EQ  ID  RPAR  LBR  ID  AS  INT  SEMI  RBR
Parser output (AST):
Adopted From UC Berkeley: Prof. Bodik  CS 164  Lecture 5
7
P
a
r
s
e
 
t
r
e
e
 
f
o
r
 
e
x
a
m
p
l
e
:
 
i
f
 
(
x
=
=
y
)
 
{
a
=
1
;
}
Adopted From UC Berkeley: Prof. Bodik  CS 164  Lecture 5
8
IF LPAR ID == ID RPAR LBR ID = INT SEMI RBR
leaves are tokens
P
a
r
s
e
 
T
r
e
e
Representation of grammars in a tree-like form.
Is a one-to-one mapping from the grammar to a tree-form.
A parse tree pictorially shows how the start
symbol of a grammar derives a string in the
language. … Dragon Book
C Statement:
 return a + 2
a very formal representation that
strictly shows how the parser
understands the statement return a + 2;
A
b
s
t
r
a
c
t
 
S
y
n
t
a
x
 
T
r
e
e
 
(
A
S
T
)
Simplified syntactic representations of the source code, and they're
most often expressed by the data structures of the language used for
implementation
Without showing the whole syntactic clutter, represents the parsed
string in a structured way, discarding all information that may be
important for parsing the string, but isn't needed for analyzing it.
 ASTs differ from parse trees because superficial
distinctions of form, unimportant for translation,
do not appear in syntax trees.. … Dragon Book
C Statement:
 return a + 2
AST
T
h
e
 
S
t
r
u
c
t
u
r
e
 
o
f
 
a
 
C
o
m
p
i
l
e
r
Adopted From UC Berkeley: Prof. Bodik  CS 164  Lecture 5
13
scanner
parser
checker
code gen
Source code (stream of characters)
stream of tokens
Abstract Syntax Tree (AST) 
AST with annotations (types, declarations)
Machine/byte code
Determine 
whether 
source 
is
 
meaningful
Check 
for 
semantic
 
errors
Check 
for 
type
 
errors
Gather 
type 
information 
for 
subsequent
 
stages
Relate variable uses 
to 
their
 
declarations
Some semantic 
analysis 
takes place 
during
 
parsing
Example 
errors 
(from
 
C)
function1 =
 
3.14159;
x = 570 + “hello, world!”  scalar[i];
C
h
e
c
k
e
r
 
(
S
e
m
a
n
t
i
c
 
A
n
a
l
y
s
i
s
)
Symbol
 
Tables
Compile-time data
 
structures
Hold 
names, 
type 
information, and 
scope 
information
 
for
variables
Scopes
A name
 
space
e.g.,  
In 
C, 
each set 
of curly 
braces defines 
a 
new
 
scope
Can create 
a 
separate 
symbol 
table for 
each
 
scope
S
y
n
t
a
c
t
i
c
 
&
 
S
e
m
a
n
t
i
c
 
A
n
a
l
y
s
i
s
Using Symbol
 
Tables
For each variable
 
declaration:
Check 
for 
symbol 
table
 
entry
Add 
new entry (parsing); add 
type 
info 
(semantic
 
analysis)
For each variable
 
use:
Check 
symbol 
table 
entry (semantic
 
analysis)
S
y
n
t
a
c
t
i
c
 
&
 
S
e
m
a
n
t
i
c
 
A
n
a
l
y
s
i
s
D
i
s
a
d
v
a
n
t
a
g
e
s
 
o
f
 
A
S
T
s
AST has many similar forms
E.g., for, while, repeat...until
E.g., if, ?:, switch
Expressions in AST may be complex, nested
(x * y) + (z > 5 ? 12 * z : z + 20)
Want simpler representation for analysis
...at least, for dataflow analysis
17
int x = 1 // what’s the value of x ?
              // AST traversal can give the answer, right?
What about int x; x = 1; or int x= 0; x += 1;  ?
Slide Note
Embed
Share

This content delves into the intricate workings of compilers, focusing on abstract syntax trees (AST) analysis and the structure of a compiler. It covers topics such as source code parsing, control flow graphs, data flow analysis, and syntactic analysis. Through examples and parse trees, it elucidates the process of transforming source code into AST and the role of parsers in generating parse trees. Adopted from UC Berkeley lectures by Prof. Bodik, this comprehensive guide provides insights into compiler organization and operation.

  • Compiler
  • Syntax Analysis
  • Abstract Syntax Tree
  • UC Berkeley
  • Prof. Bodik

Uploaded on Feb 27, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Basic Program Basic Program Analysis: AST Analysis: AST CS 4501 Baishakhi Ray

  2. Compiler Overview Compiler Overview Abstract Syntax Tree : Source code parsed to produce AST Control Flow Graph: AST is transformed to CFG Data Flow Analysis: operates on CFG

  3. The Structure of a Compiler The Structure of a Compiler Source code (stream of characters) scanner stream of tokens parser Abstract Syntax Tree (AST) checker AST with annotations (types, declarations) code gen Machine/byte code Adopted From UC Berkeley: Prof. Bodik CS 164 Lecture 5 3

  4. Syntactic Analysis Syntactic Analysis Input: sequence of tokens from scanner Output: abstract syntax tree Actually, parser first builds a parse tree AST is then built by translating the parse tree parse tree rarely built explicitly; only determined by, say, how parser pushes stuff to stack 4 Adopted From UC Berkeley: Prof. Bodik CS 164 Lecture 5

  5. Example Example Source Code Parser input 4*(2+3) NUM(4) TIMES LPAR NUM(2) PLUS NUM(3) RPAR Parser output (AST): * + NUM(4) NUM(2) NUM(3) 5 Adopted From UC Berkeley: Prof. Bodik CS 164 Lecture 5

  6. Parse tree for the example: 4*(2+3) Parse tree for the example: 4*(2+3) EXPR EXPR EXPR NUM(4) TIMES LPAR NUM(2) PLUS NUM(3) RPAR leaves are tokens 6 Adopted From UC Berkeley: Prof. Bodik CS 164 Lecture 5

  7. Another example Another example Source Code if (x == y) { a=1; } Parser input IF LPAR ID EQ ID RPAR LBR ID AS INT SEMI RBR Parser output (AST): IF-THEN == = ID ID ID INT 7 Adopted From UC Berkeley: Prof. Bodik CS 164 Lecture 5

  8. Parse tree for example: if (x==y) {a=1;} Parse tree for example: if (x==y) {a=1;} STMT BLOCK STMT EXPR EXPR IF LPAR ID == ID RPAR LBR ID = INT SEMI RBR leaves are tokens 8 Adopted From UC Berkeley: Prof. Bodik CS 164 Lecture 5

  9. Parse Tree Parse Tree Representation of grammars in a tree-like form. A parse tree pictorially shows how the start symbol of a grammar derives a string in the language. Dragon Book Is a one-to-one mapping from the grammar to a tree-form.

  10. C Statement: return a + 2 a very formal representation that strictly shows how the parser understands the statement return a + 2;

  11. Abstract Syntax Tree (AST) Abstract Syntax Tree (AST) Simplified syntactic representations of the source code, and they're most often expressed by the data structures of the language used for implementation ASTs differ from parse trees because superficial distinctions of form, unimportant for translation, do not appear in syntax trees.. Dragon Book Without showing the whole syntactic clutter, represents the parsed string in a structured way, discarding all information that may be important for parsing the string, but isn't needed for analyzing it.

  12. AST C Statement: return a + 2

  13. The Structure of a Compiler The Structure of a Compiler Source code (stream of characters) scanner stream of tokens parser Abstract Syntax Tree (AST) checker AST with annotations (types, declarations) code gen Machine/byte code Adopted From UC Berkeley: Prof. Bodik CS 164 Lecture 5 13

  14. Checker (Semantic Analysis) Checker (Semantic Analysis) Determine whether source is meaningful Check for semanticerrors Check for type errors Gather type information for subsequent stages Relate variable uses to their declarations Some semantic analysis takes place during parsing Example errors (from C) function1 = 3.14159; x = 570 + hello, world! scalar[i];

  15. Syntactic & Semantic Analysis Syntactic & Semantic Analysis SymbolTables Compile-time data structures Hold names, type information, and scope information for variables Scopes A namespace e.g., In C, each set of curly braces defines a new scope Can create a separate symbol table for each scope

  16. Syntactic & Semantic Analysis Syntactic & Semantic Analysis Using SymbolTables For each variable declaration: Check for symbol table entry Add new entry (parsing); add type info (semantic analysis) For each variableuse: Check symbol table entry (semantic analysis)

  17. Disadvantages of ASTs Disadvantages of ASTs AST has many similar forms E.g., for, while, repeat...until E.g., if, ?:, switch int x = 1 // what s the value of x ? // AST traversal can give the answer, right? What about int x; x = 1; or int x= 0; x += 1; ? Expressions in AST may be complex, nested (x * y) + (z > 5 ? 12 * z : z + 20) Want simpler representation for analysis ...at least, for dataflow analysis 17

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#