Introduction to Parsing and Scanning

 
Parsing & Scanning
Lecture 1
 
COMP 202
10/25/2004
Derek Ruths
 
druths@rice.edu
Office: DH Rm #3010
 
Games
What is Parsing?
parse
 (
v.
)  to 
break down
 into its
component parts of speech with an
explanation of the 
form, function, and
syntactical relationship
 of each part. 
--The American
Heritage Dictionary
 
High-Level View
 
Text,
Source
Code,
Speech
 
Structured
Representation
(e.g.  Abstract
Syntax Tree)
 
Overview
 
Lecture 1:
Intro to scanning
Intro to parsing
Basics of building a scanner in Java
Lab:  
Implementing a scanner
Lecture 2:
Basic Parsing Theory
Design of an Object-Oriented Parser
High-Level View
Text,
Source
Code,
Speech
Structured
Representation
(e.g.  Abstract
Syntax Tree)
Scanning:
Breaking Things Down
 
Scanning: Token List
 
Token
List
 
Tolkien
List
 
<
NUM
: [(”0” - “9”)+]>
<OP:    [”+”, “*”]>
<
ID
:     [alpha (alphaNum)*]>
Structured
Representation
(e.g.  Abstract
Syntax Tree)
High-Level View
(you saw this earlier)
Text,
Source
Code,
Speech
 
Parsing:
Organizing Things
 
Manual Scanning in
Java
Tokenizing Example
public static final int PLUS = ‘+’;
public void tokenize(String str) {
StreamTokenizer stok = new StreamTokenizer(new StringReader(str));
int token;
stok.ordinaryChar(PLUS);
stok.parseNumbers();
while((token = stok.nextToken()) != StreamTokenizer.TT_EOF) {
switch(token) {
case TT_WORD:
System.out.println(”WORD = “ + stok.sval); break;
case TT_NUMBER:
System.out.println(”NUM = “ + stok.nval); break;
case PLUS:
System.out.printlN(”PLUS”); break;
}
}
}
 
Initialization
 
Configuration
 
Scanning
 
Tokenizing Example
 
public static final int PLUS = ‘+’;
 
public void tokenize(String str) {
StreamTokenizer stok = new StreamTokenizer(new StringReader(str));
int token;
 
stok.ordinaryChar(PLUS);
stok.parseNumbers();
 
while((token = stok.nextToken()) != StreamTokenizer.TT_EOF) {
switch(token) {
case TT_WORD:
System.out.println(”WORD = “ + stok.sval); break;
case TT_NUMBER:
System.out.println(”NUM = “ + stok.nval); break;
case PLUS:
System.out.printlN(”PLUS”); break;
}
}
}
 
Initialization
 
Configuration
 
Scanning
 
java.io.
StreamTokenizer
Configuration: It’s like programming a VCR!
 
Tokenizing Example
 
public static final int PLUS = ‘+’;
 
public void tokenize(String str) {
StreamTokenizer stok = new StreamTokenizer(new StringReader(str));
int token;
 
stok.ordinaryChar(PLUS);
stok.parseNumbers();
 
while((token = stok.nextToken()) != StreamTokenizer.TT_EOF) {
switch(token) {
case TT_WORD:
System.out.println(”WORD = “ + stok.sval); break;
case TT_NUMBER:
System.out.println(”NUM = “ + stok.nval); break;
case PLUS:
System.out.printlN(”PLUS”); break;
}
}
}
 
Initialization
 
Configuration
 
Scanning
 
Call 
int StreamTokenizer.nextToken()
 to get
the next token.
String StreamTokenizer.sval (public field)
holds the token value for TT_WORD and
quote token types
double StreamTokenizer.nval (public
field)
 holds the token value for
TT_NUMBER token type
 
java.io.
StreamTokenizer
Scanning: It’s like calling “nextToken” over and
over!
 
Tokenizing Example
 
public static final int PLUS = ‘+’;
 
public void tokenize(String str) {
StreamTokenizer stok = new StreamTokenizer(new StringReader(str));
int token;
 
stok.ordinaryChar(PLUS);
stok.parseNumbers();
 
while((token = stok.nextToken()) != StreamTokenizer.TT_EOF) {
switch(token) {
case TT_WORD:
System.out.println(”WORD = “ + stok.sval); break;
case TT_NUMBER:
System.out.println(”NUM = “ + stok.nval); break;
case PLUS:
System.out.printlN(”PLUS”); break;
}
}
}
 
Initialization
 
Configuration
 
Scanning
 
Constructor: 
StreamTokenizer(Reader r)
 
java.io.
Reader
 - class for reading bytes
FileReader
 - read bytes from a File
StringReader
 - read bytes from a String
 
java.io.
StreamTokenizer
Initialization: It’s like using Java I/O!
 
Questions?
Slide Note
Embed
Share

This lecture covers the fundamentals of parsing and scanning, exploring concepts such as breaking down into component parts of speech, building scanners, basic parsing theory, and structured representations like Abstract Syntax Trees. The content delves into the organization of grammar, token lists, and the process of manual scanning in Java, providing insights into speech recognition, games, IDEs, compilers, and interpreters within the context of computer science.

  • Parsing
  • Scanning
  • Java
  • Abstract Syntax Tree
  • Programming

Uploaded on Mar 01, 2025 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Parsing & Scanning Lecture 1 COMP 202 10/25/2004 Derek Ruths druths@rice.edu Office: DH Rm #3010

  2. Speech-Recognition Games IDEs Compilers/Interpreters

  3. What is Parsing? Scanning parse (v.) to break down into its component parts of speech with an explanation of the form, function, and syntactical relationship of each part. --The American Heritage Dictionary Parsing

  4. High-Level View Text, Source Code, Speech Structured Representation (e.g. Abstract Syntax Tree) Scanner Parser

  5. Overview Lecture 1: Intro to scanning Intro to parsing Basics of building a scanner in Java Lab: Implementing a scanner Lecture 2: Basic Parsing Theory Design of an Object-Oriented Parser

  6. High-Level View Text, Source Code, Speech Structured Representation (e.g. Abstract Syntax Tree) Scanner Parser

  7. Scanning: Breaking Things Down Token List Token Type Token Value <NUM, 3 >, <PLUS, + >, <ID, x >, <EOF> 3 + x Scanner Character Stream Tokens

  8. Scanning: Token List Tolkien List Token List Token Descriptor Token Type <NUM: [( 0 - 9 )+]> <OP: [ + , * ]> <ID: [alpha (alphaNum)*]> Tolkien Descriptor = Magical Powers Tolkien Type = Wizard

  9. High-Level View (you saw this earlier) Text, Source Code, Speech Structured Representation (e.g. Abstract Syntax Tree) Scanner Parser

  10. Parsing: Organizing Things Grammar <NUM, 3 >, <PLUS, + >, <ID, x >, <EOF> + Parser 3 x Abstract Syntax Tree Tokens

  11. Manual Scanning in Java Token Value Token Type Token List <NUM, 3 >, <PLUS, + >, <ID, x >, <EOF> Scanner 3 + x Character Stream Tokens

  12. Tokenizing Example public static final int PLUS = + ; public void tokenize(String str) { StreamTokenizer stok = new StreamTokenizer(new StringReader(str)); int token; Initialization stok.ordinaryChar(PLUS); stok.parseNumbers(); Configuration while((token = stok.nextToken()) != StreamTokenizer.TT_EOF) { switch(token) { case TT_WORD: System.out.println( WORD = + stok.sval); break; case TT_NUMBER: System.out.println( NUM = + stok.nval); break; case PLUS: System.out.printlN( PLUS ); break; } } Scanning }

  13. Tokenizing Example public static final int PLUS = + ; Initialization public void tokenize(String str) { StreamTokenizer stok = new StreamTokenizer(new StringReader(str)); int token; stok.ordinaryChar(PLUS); stok.parseNumbers(); Configuration while((token = stok.nextToken()) != StreamTokenizer.TT_EOF) { switch(token) { case TT_WORD: System.out.println( WORD = + stok.sval); break; case TT_NUMBER: System.out.println( NUM = + stok.nval); break; case PLUS: System.out.printlN( PLUS ); break; } } Scanning }

  14. java.io.StreamTokenizer Configuration: It s like programming a VCR! Token Type (int) Token Desc. How to Customize a word (no spaces) StreamTokenizer.TT_WORD void wordChars(int low, int high) a string quoted by qch int qch void quoteChar(int qch) e.g. quoteChar( : ) e.g. :hello there: is a quoted string StreamTokenizer.TT_NUMBER Numbers void parseNumbers() int ch the character value of ch void ordinaryChar(int ch) e.g. ordinaryChar( + ) e.g. (int) +

  15. Tokenizing Example public static final int PLUS = + ; Initialization public void tokenize(String str) { StreamTokenizer stok = new StreamTokenizer(new StringReader(str)); int token; stok.ordinaryChar(PLUS); stok.parseNumbers(); Configuration while((token = stok.nextToken()) != StreamTokenizer.TT_EOF) { switch(token) { case TT_WORD: System.out.println( WORD = + stok.sval); break; case TT_NUMBER: System.out.println( NUM = + stok.nval); break; case PLUS: System.out.printlN( PLUS ); break; } } Scanning }

  16. java.io.StreamTokenizer Scanning: It s like calling nextToken over and over! Call int StreamTokenizer.nextToken() to get the next token. String StreamTokenizer.sval (public field) holds the token value for TT_WORD and quote token types double StreamTokenizer.nval (public field) holds the token value for TT_NUMBER token type

  17. Tokenizing Example public static final int PLUS = + ; Initialization public void tokenize(String str) { StreamTokenizer stok = new StreamTokenizer(new StringReader(str)); int token; Configuration stok.ordinaryChar(PLUS); stok.parseNumbers(); while((token = stok.nextToken()) != StreamTokenizer.TT_EOF) { switch(token) { case TT_WORD: System.out.println( WORD = + stok.sval); break; case TT_NUMBER: System.out.println( NUM = + stok.nval); break; case PLUS: System.out.printlN( PLUS ); break; } } Scanning }

  18. java.io.StreamTokenizer Initialization: It s like using Java I/O! Constructor: StreamTokenizer(Reader r) java.io.Reader - class for reading bytes FileReader - read bytes from a File StringReader - read bytes from a String

  19. Questions?

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#