Understanding Programming Language Structures

csce 330 l.w
1 / 84
Embed
Share

Programming language structures encompass syntax and semantics, essential for defining the composition and outcome of programs. This involves precise description at different levels such as lexical, concrete, and abstract syntax. Grammars play a crucial role in defining the syntax of a language, facilitating both compiler and human understanding.

  • Programming languages
  • Syntax
  • Semantics
  • Compiler
  • Grammars

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. CSCE 330 Programming Language Structures Syntax (Slides mainly based on Tucker and Noonan) Fall 2022 A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  2. Syntax and Semantics Syntax is the set of rules that specify the composition of programs from letters, digits and other characters. Semantics is the set of rules that specify what the result/outcome of a program is. Problems with English language description of Syntax and Semantics: verbosity ambiguity UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  3. Contents 2.1 Grammars 2.1.1 Backus-Naur Form 2.1.2 Derivations 2.1.3 Parse Trees 2.1.4 Associativity and Precedence 2.1.5 Ambiguous Grammars 2.2 Extended BNF 2.3 Syntax of a Small Language: Clite 2.3.1 Lexical Syntax 2.3.2 Concrete Syntax UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  4. Thinking about Syntax The syntax of a programming language is a precise description of all its grammatically correct programs. Precise syntax was first used with Algol 60, and has been used ever since. Three levels: Lexical syntax Concrete syntax Abstract syntax UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  5. Levels of Syntax Lexical syntax = all the basic symbols of the language (names, values, operators, etc.) Concrete syntax = rules for writing expressions, statements and programs. Abstract syntax = internal representation of the program, favoring content over form. E.g., C: if ( expr ) ... Ada: if ( expr ) then discard ( ) discard then UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  6. 2.1 Grammars A metalanguage is a language used to define other languages. A grammar is a metalanguage used to define the syntax of a language. Our interest: using grammars to define the syntax of a programming language. UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  7. The General Problem of Describing Syntax: Terminology A sentence is a string of characters over some alphabet A language is a set of sentences A lexeme is the lowest level syntactic unit of a language (e.g., *, sum, begin) A token is a category of lexemes (e.g., identifier) 1-7 UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  8. Formal Definition of Languages Recognizers A recognition device reads input strings of the language and decides whether the input strings belong to the language Example: syntax analysis part of a compiler Generators A device that generates sentences of a language One can determine if the syntax of a particular sentence is correct by comparing it to the structure of the generator 1-8 UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  9. Chomsky Hierarchy Regular grammar -- least powerful Context-free grammar (BNF) Context-sensitive grammar Unrestricted grammar Noam Chomsky, 1928- UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  10. Regular Grammar Simplest; least powerful Equivalent to: Regular expression Finite-state automaton Right regular grammar: T*, A N, B N A B A UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  11. Example Integer 0 Integer | 1 Integer | ... | 9 Integer | 0 | 1 | ... | 9 UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  12. Context-Sensitive Grammars Production: , (N T)* i.e., left-hand side can be composed of strings of terminals and nonterminals | | | | UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  13. Unrestricted Grammar Equivalent to: Turing machine von Neumann machine C++, Java That is, can compute any computable function. UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  14. 2.1.1 Backus-Naur Form (BNF) Stylized version of a context-free grammar (cf. Chomsky hierarchy) Sometimes called Backus Normal Form First used to define syntax of Algol 60 Now used to define syntax of most major languages Extended BNF Improves readability and writability of BNF UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  15. BNF Grammar Set of productions: P terminal symbols: T nonterminal symbols: N start symbol: S N A production has the form whereand A N A (N T)* UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  16. Example: Binary Digits Consider the grammar: binaryDigit 0 binaryDigit 1 or equivalently: binaryDigit 0 | 1 Here, | is a metacharacter that separates alternatives. UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  17. 2.1.2 Derivations Consider the following grammar (Ginteger): Integer Digit | Integer Digit Digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 We can derive any unsigned integer, like 352, from this grammar. UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  18. Derivation of 352 as an Integer A 6-step process, starting with: Integer UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  19. Derivation of 352 (step 1) Use a grammar rule to enable each step: Integer Integer Digit UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  20. Derivation of 352 (steps 1-2) Replace a nonterminal by a right-hand side of one of its rules: Integer Integer Digit Integer 2 UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  21. Derivation of 352 (steps 1-3) Each step follows from the one before it. Integer Integer Digit Integer 2 Integer Digit 2 UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  22. Derivation of 352 (steps 1-4) Integer Integer Digit Integer 2 Integer Digit 2 Integer 5 2 UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  23. Derivation of 352 (steps 1-5) Integer Integer Digit Integer 2 Integer Digit 2 Integer 5 2 Digit 5 2 UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  24. Derivation of 352 (steps 1-6) You know you re finished when there are only terminal symbols remaining. Integer Integer Digit Integer 2 Integer Digit 2 Integer 5 2 Digit 5 2 3 5 2 UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  25. A Different Derivation of 352 Integer Integer Digit Integer Digit Digit Digit Digit Digit 3 Digit Digit 3 5 Digit 3 5 2 This is called a leftmost derivation, since at each step the leftmost nonterminal is replaced. (The first one was a rightmost derivation.) UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  26. Notation for Derivations Integer * 352 Means that 352 can be derived in a finite number of steps using the grammar for Integer. 352 L(G) Means that 352 is a member of the language defined by grammar G. L(G) = { T* | Integer * } Means that the language defined by grammar G is the set of all symbol strings that can be derived as an Integer. UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  27. 2.1.3 Parse Trees A parse tree is a graphical representation of a derivation. The root of the tree is the start symbol. Each internal node of the tree corresponds to a step in the derivation. The children of a node represent a right-hand side of a production. Each leaf node represents a symbol of the derived string, reading from left to right. UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  28. E.g., The stepInteger Integer Digit appears in the parse tree as: Integer Integer Digit UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  29. Parse Tree for 352 as an Integer Figure 2.1 UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  30. Arithmetic Expression Grammar The following grammar defines the language of arithmetic expressions with 1-digit integers, addition, and subtraction. Expr Expr + Term | Expr Term | Term Term 0 | ... | 9 | ( Expr ) UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  31. Parse of the String 5-4+3 Figure 2.2 UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  32. Contents 2.1 Grammars 2.1.1 Backus-Naur Form 2.1.2 Derivations 2.1.3 Parse Trees 2.1.4 Associativity and Precedence 2.1.5 Ambiguous Grammars 2.2 Extended BNF 2.3 Syntax of a Small Language: Clite 2.3.1 Lexical Syntax 2.3.2 Concrete Syntax UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  33. 2.1.4 Associativity and Precedence A grammar can be used to define associativity and precedence among the operators in an expression. E.g., + and - are left-associative operators in mathematics; * and / have higher precedence than + and - . Consider the grammar G1: Expr -> Expr + Term | Expr Term | Term Term -> Term * Factor | Term / Factor | Term % Factor | Factor Factor -> Primary ** Factor | Primary Primary -> Integer | ( Expr ) UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  34. Parse of 4**2**3+5*6+7 for Grammar G1 Figure 2.3 UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  35. Associativity and Precedence for Grammar G1 Table 2.1 Precedence 3 2 1 Associativity Operators right left left ** * / % \ + - Note: These relationships are shown by the structure of the parse tree: highest precedence at the bottom, and left-associativity on the left at each level. UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  36. 2.1.5 Ambiguous Grammars A grammar is ambiguous if one of its strings has two or more different parse trees. E.g., Grammar G1 above is unambiguous. C, C++, and Java have a large number of operators and precedence levels Instead of using a large grammar, we can: Write a smaller ambiguous grammar, and Give separate precedence and associativity (e.g., Table 2.1 on the previous slide) UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  37. An Ambiguous Expression Grammar G2 Expr -> Expr Op Expr | ( Expr ) | Integer Op -> + | - | * | / | % | ** Notes: G2 is equivalent to G1. i.e., its language is the same. G2 has fewer productions and nonterminals than G1. However, G2 is ambiguous. UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  38. Ambiguous Parse of 5-4+3 Using Grammar G2 Figure 2.4 UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  39. The Dangling Else IfStatement -> if ( Expression ) Statement | if ( Expression ) Statement else Statement Statement -> Assignment | IfStatement | Block Block -> { Statements } Statements -> Statements Statement | Statement UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  40. Example of Dangling Else With which if does the following else associate? if (x < 0) if (y < 0) y = y - 1; else y = 0; Answer: either one! UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  41. The Dangling Else Ambiguity Figure 2.5 UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  42. Solving the dangling else ambiguity 1. Algol 60, C, C++: associate each else with closest if; use {} or begin end to override. 2. Algol 68, Modula, Ada: use explicit delimiter to end every conditional (e.g., if fi) 3. Java: rewrite the grammar to limit what can appear in a conditional: IfThenStatement -> if ( Expression ) Statement IfThenElseStatement -> if ( Expression )StatementNoShortIf else Statement The category StatementNoShortIf includes all statements except IfThenStatement. UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  43. 2.2 Extended BNF (EBNF) BNF: recursion for iteration nonterminals (abstractions) for grouping EBNF: additional metacharacters { } for a series of zero or more ( ) for a list, must pick one [ ] for an optional list; pick none or one UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  44. EBNF Examples Expression is a list of one or more Terms separated by operators + and - Expression -> Term{ (+ | -)Term} IfStatement -> if (Expression)Statement[else Statement] C-style EBNF lists alternatives vertically and uses opt to signify optional parts. E.g., IfStatement: if ( Expression ) Statement ElsePartopt ElsePart: else Statement UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  45. EBNF to BNF We can always rewrite an EBNF grammar as a BNF grammar.E.g., A-> x { y } z can be rewritten: A -> x A' z A' -> e| y A' (The letter e stands for the empty string.) (Rewriting EBNF rules with ( ), [ ] is left as an exercise.) While EBNF is no more powerful than BNF, its rules are often simpler and clearer. UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  46. Syntax Diagram for Expressions with Addition UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  47. EBNF Grammar from [G&J] (a) Syntax rules <program>::={ <statement>* } <statement>::=<assignment> | <conditional> | <loop> <assignment>::=<identifier> =<expr>; <conditional>::=if<expr> {<statement>+ } | if<expr> { <statement>+ } else { <statement>+ } <loop>::=while<expr> { <statement>+ } <expr> ::=<identifier> | <number>| (<expr>) | <expr><operator><expr> (b) Lexical rules <operator>::= + | - | * | / | = | | < | > | | <identifier>::= <letter> <ld>* <ld>::= <letter> | <digit> <number>::= <digit>+ <letter>::= a | b | c | | z <digit>::= 0 | 1 | | 9 UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  48. Syntax Diagrams from [G&J] UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  49. Contents 2.1 Grammars 2.1.1 Backus-Naur Form 2.1.2 Derivations 2.1.3 Parse Trees 2.1.4 Associativity and Precedence 2.1.5 Ambiguous Grammars 2.2 Extended BNF 2.3 Syntax of a Small Language: Clite 2.3.1 Lexical Syntax 2.3.2 Concrete Syntax UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

  50. 2.3 Syntax of a Small Language: Clite Motivation for using a subset of C: Language Pascal C C++ Java Grammar (pages) 5 6 22 14 Reference Jensen & Wirth Kernighan & Richie Stroustrup Gosling, et. al. The Clite grammar fits on one page (Figure 2.7 on p.38 [T]; next 3 slides), so it s a far better tool for studying language design. UNIVERSITY OF SOUTH CAROLINA UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Department of Computer Science and Engineering

Related


More Related Content