Understanding Lexical Analysis in Computer Science

lexical analysis n.w
1 / 26
Embed
Share

Dive into the world of lexical analysis with insights from Dr. Alok Kumar, covering topics such as the role of lexical analyzers, tokens, patterns, lexemes, errors, and more. Learn about the importance of separating lexical analysis from parsing, the attributes for tokens, and examples illustrating the concepts discussed.

  • Lexical Analysis
  • Tokens
  • Patterns
  • Lexemes
  • Computer Science

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Lexical Analysis Dr. Alok Kumar Department of Computer Science and Engineering UIET, CSJM University, Kanpur

  2. Topic Covered Topic Covered Role of Lexical Analyzer Tokens, Patterns, Lexemes Lexical Errors and Recovery Specification of Tokens Recognition of Tokens Finite Automata Toollex Conclusion Lexical Analysis- Dr. Alok Kumar 2

  3. Lexical Lexical analyzer analyzer The main task of lexical analysis is to read input characters in the code and produce tokens. "Get next token" is a command which is sent from the parser to the lexical analyzer. On receiving this command, the lexical analyzer scans the input until it finds the next token. Lexical Analysis- Dr. Alok Kumar 3

  4. Role Role of of lexical lexical analyzer analyzer Lexical Analysis- Dr. Alok Kumar 4

  5. Why Why to and and parsing parsing to separate separate Lexical Lexical analysis analysis Simplicity of design Improving compiler efficiency Enhancing compiler portability Lexical Analysis- Dr. Alok Kumar 5

  6. Tokens, Tokens, Patterns Patterns and and Lexemes Lexemes A token is a pair a token name and an optional token value A pattern is a description of the form that the lexemes of a token may take A lexeme is a sequence of characters in the source program that matches the pattern for a token Lexical Analysis- Dr. Alok Kumar 6

  7. E Ex xample ample Lexical Analysis- Dr. Alok Kumar 7

  8. Attributes Attributes for for tokens tokens E = M * C ** 2 <id, pointer to symbol table entry for E> <assign-op> <id, pointer to symbol table entry for M> <mult-op> <id, pointer to symbol table entry for C> <exp-op> <number, integer value 2> Lexical Analysis- Dr. Alok Kumar 8

  9. Lexical Lexical errors errors Some errors are out of power of lexical analyzer to recognize: fi (a == f(x)) However it may be able to recognize errors like: d = 2r Such errors are recognized when no pattern for tokens matches a character sequence Lexical Analysis- Dr. Alok Kumar 9

  10. Error Error recovery recovery Panic mode: successive characters are ignored until we reach to a well formed token Delete one character from the remaining input Insert a missing character into the remaining input Replace a character by another character Transpose two adjacent characters Lexical Analysis- Dr. Alok Kumar 10

  11. Input Input Buffering Buffering Sometimes lexical analyzer needs to look ahead some symbols to decide about the token to return In C language: we need to look after -, = or < to decide what token to return In Fortran: DO 5 I = 1.25 We need to introduce a two buffer scheme to handle large look-aheads safely Lexical Analysis- Dr. Alok Kumar 11

  12. Specification Specification of of tokens tokens In theory of compilation regular expressions are used to formalize the specification of tokens Regular expressions are means for specifying regular languages Example: letter(letter | digit)* Each regular expression is a pattern specifying the form of strings Lexical Analysis- Dr. Alok Kumar 12

  13. Regular Regular Expressions Expressions is a regular expression denoting the language L( ) = { }, containing only the empty string If a is a symbol in then a is a regular expression, L(a) = {a} If r and s are two regular expressions with languages L(r) and L(s), then r|s is a regular expression denoting the language L(r) L(s), containing all strings of L(r) and L(s) rs is a regular expression denoting the language L(r)L(s), created by concatenating the strings of L(s) to L(r) r* is a regular expression denoting (L(r))*, the set containing zero or more occurrences of the strings of L(r) (r) is a regular expression corresponding to the language L(r) Lexical Analysis- Dr. Alok Kumar 13

  14. Regular Regular definitions definitions d1 -> r1 d2 -> r2 dn -> rn Example: letter_ -> A | B | | Z | a | b | | Z | _ digit -> 0 | 1 | | 9 Id -> letter_ (letter_ | digit)* Lexical Analysis- Dr. Alok Kumar 14

  15. Extensions Extensions One or more instances: (r)+ Zero of one instances: r? Character classes: [abc] Example: letter_-> [A-Za-z_] digit-> [0-9] ID-> letter_(letter_|digit)* Lexical Analysis- Dr. Alok Kumar 15

  16. Examples Examples with with = = {0,1} {0,1} (0|1)*: All binary strings including the empty string (0|1)(0|1)*: All nonempty binary strings 0(0|1)*0: All binary strings of length at least 2, starting and ending with 0s (0|1)*0(0|1)(0|1)(0|1): All binary strings with at least three characters in which the third-last character is always 0 0*10*10*10*: All binary strings possessing exactly three 1s Lexical Analysis- Dr. Alok Kumar 16

  17. Recognition Recognition of of tokens tokens Starting point is the language grammar to understand the tokens: stmt -> if expr then stmt | if expr then stmt else stmt | expr -> term relop term |term term -> id | number Lexical Analysis- Dr. Alok Kumar 17

  18. Recognition Recognition of The next step is to formalize the patterns: digit -> [0-9] Digits -> digit+ number -> digit(.digits)? (E[+-]? Digit)? letter -> [A-Za-z_] id -> letter (letter|digit)* If -> if Then -> then Else -> else Relop -> < | > | <= | >= | = | <> We also need to handle whitespaces: ws -> (blank | tab | newline)+ of tokens tokens (cont.) (cont.) Lexical Analysis- Dr. Alok Kumar 18

  19. Transition Transition diagrams diagrams Transition diagram for relop Lexical Analysis- Dr. Alok Kumar 19

  20. Transition Transition diagrams diagrams (cont.) (cont.) Transition diagram for reserved words and identifiers Transition diagram for unsigned numbers Lexical Analysis- Dr. Alok Kumar 20

  21. Architecture Architecture of based lexical based lexical analyzer of a a transition transition- -diagram analyzer diagram- - TOKEN getRelop() { TOKEN retToken = new (RELOP) while (1) { /* repeat character processing until a return or failure occurs */ switch(state) { case 0: c= nextchar(); if (c == < ) state = 1; else if (c == = ) state = 5; else if (c == > ) state = 6; else fail(); /* lexeme is not a relop */ break; case 1: case 8: retract(); retToken.attribute = GT; return(retToken); } Lexical Analysis- Dr. Alok Kumar 21

  22. Finite Finite Automata Automata Regular expressions = specification Finite automata = implementation A finite automaton consists of An input alphabet A set of states S A start state n A set of accepting states F S A set of transitions state state Lexical Analysis- Dr. Alok Kumar 22

  23. Lexical Analyzer Generator Lexical Analyzer Generator - - Lex Lex Lexical Analysis- Dr. Alok Kumar 23

  24. Structure Structure of of Lex Lex programs programs declarations %% translationrules Pattern {Action} %% auxiliaryfunctions Lexical Analysis- Dr. Alok Kumar 24

  25. E Ex xample ample . %{ /* definitions of manifestconstants LT, LE, EQ, NE, GT, GE, IF, THEN, ELSE, ID, NUMBER, RELOP */ %} /* regulardefinitions delim ws letter[A-Za-z] digit [0-9] id number [ \t\n] {delim}+ {letter}({letter}|{digit})* {digit}+(\.{digit}+)?(E[+-]?{digit}+)? %% {ws} {/* no action and no return */} if {return(IF);} then {return(THEN);} else {return(ELSE);} {id} {yylval = (int) installID(); return(ID);} {number} {yylval = (int) installNum();return(NUMBER);} 25 Lexical Analysis- Dr. Alok Kumar

  26. Conclusion Conclusion Words of a language can be specified using regular expressions NFA and DFA can act as acceptors Regular expressions can be converted to NFA NFA can be converted to DFA Automated tool lex can be used to generate lexical analyser for a language Lexical Analysis- Dr. Alok Kumar 26

Related


More Related Content