Syntax and Semantics

Slide Note

Syntax in language design refers to the structure of expressions defined by rules, while semantics assigns meaning to generated expressions. Explore how languages are classified, the importance of context-free grammar, and the role of BNF in specifying language grammar.

chesebro_a Follow

Uploaded on Mar 08, 2025 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Syntax and Semantics Syntax is the form or structure of expressions or statements for a given language defined as a series of rules of the form <class> <class2> item1 we define every rule as a LHS (left hand side) which is a non- terminal, which maps to one or more possible RHS (right hand sides) RHS will consist of non-terminals and terminals and may permit recursive definition when the LHS non-terminal also appears on the RHS as in <id> char | <id>char in Java, a while loop would be defined as while(<bool_expr>) <stmt> Semantics is how we proscribe a meaning to the expressions generated from a language

Terminology Language group of words and the rules for combining those words we usually define a language as a lexicon (words) and grammar (rules) Sentence any legal statement that can be generated in the language Token language category example in a programming language: <identifier>, <type>, <open_paren_operator> Lexeme lowest level syntactic unit in the language examples: int, void, (, *, if

Languages We classify languages into one of four categories: Regular Context-Free Context-Sensitive Recursively Enumerable Here, we are interested in the context-free grammar these include those which can be generated from a language generator all natural languages and programming languages fall into this category You will study the other languages in more detail in 485 Language Recognizer given a sentence, is it in the given language? often called parsers Language Generator given a language, create legal and meaningful sentences We can build a language recognizer if we already have a language generator Grammar description of a language used to generate legal sentences in the language given a grammar, language recognizer can be created

BNF (Backus Naur Form) Equivalent to a context-free language BNF is a notation (or a meta-language) used to specify the grammar of a language The BNF can then be used for language generation or recognition BNF uses rules to map non-terminal symbols (tokens) into other non- terminals and terminals (lexemes) We define a BNF Grammar as G={alphabet, rules, <start>} alphabet consists of those symbols used in the rules both terminal symbols and non-terminal symbols, non-terminal symbols are placed in < > rules map from a non-terminal to other elements in the alphabet for instance, a rule might say <A> a<B> | b<A> rules can be recursive as shown above where an <A> can be applied to generate a terminal (b) and another <A> <start> is a non-terminal which is the starting point for a language generator and must be on at least 1 rule s left hand side

BNF Examples Here s a partial example of a BNF grammar taken from Wikipedia: <postal-address> ::= <name-part> <street-address> <zip> this rule tells us how to generate a legal postal address <name-part> ::= <personal-part> <last-name> <suffix-part> <EOL> | <personal-part> <name-part><EOL> legal name consists of a personal part followed by a last name followed optionally by a suffix and EOL, the | is used to enumerate multiple possible mappings, or OR <personal-part> ::= <first-name> | <initial> "." personal part is a first name or an initial and period <street-address> ::= <house-num> <street-name><EOL> | <house-num> <street-name> <apt-num><EOL> street address is house number, street name and optionally apt number <zip> ::= <town-name> "," <state-code> <zip-code><EOL> <suffix-part> ::= "Sr." | "Jr." | <roman-numeral> | ""

Examples of Grammar Rules <program> -> begin <stmt_list> end Notice both terminals and non-terminals on the right side Recursion is used as necessary <ident_list> -> ident | ident, <ident_list> The | symbol means or so that an <ident_list> can be a single ident, or an ident, a comma, and more of an ident_list Other examples are: <assign> -> <var> = <expression> <if_stmt> -> if <logical_expr> then <stmt> | if <logical_expr> then <stmt> else <stmt> <stmt_list> <stmt> | <stmt> ; <stmt_list> <stmt> <assign> | <if_stmt> <logical_expr> <boolean_expr> AND <boolean_expr> | <boolean_expr> OR <boolean_expr> | <boolean_expr> <boolean_expr> <expr> <relational_op> <expr> <relational_op> < | > | = | != | <= | >= Note defined here is <expr>, or blocks or loops

Example Grammar <program> -> begin <stmt_list> end <stmt_list> -> <stmt> | <stmt> ; <stmt_list> <stmt> -> <var> = <expr> <var> -> a | b | c | d <expr> -> <var> + <var> | <var> - <var> | <var> With this grammar, we can generate simple programs, anything generated from a grammar starting with the start symbol (<program> in this case) is called a derivation, on the left is a derivation from this grammar but the program on the right cannot be derived, can you figure out why? begin a = b + c; b = c end end begin d = a + b c

Parse Trees <assign> <id> = <expr> <id> A | B | C <expr> <id> + <expr> | <id> * <expr> | (<expr>) | <id> <assign> A hierarchical structure displaying the derivation of an expression in the grammar Leaf nodes are always terminals, non-leaf nodes are non-terminals <id> = <expr> A <id> * <expr> B ( <expr> ) <id> + <expr> Parse tree for the derivation <assign> <id> = <expr> A = <expr> A = <id> * <expr> A = B * <expr> A = B * (<expr>) A = B * (<id> + <expr>) A = B * (A + <expr>) A = B * (A + <id> ) A = B * (A + C) A <id> C

An Ambiguous Grammar <assign> <assign> <id> := <expr> <id> A | B | C <expr> <expr> + <expr> | <expr> * <expr> | (<expr>) | <id> This grammar creates two parse trees for: <assign> A = B + A * C The reason for the ambiguity is that + and * are treated with equal precedence but in fact * should have higher precedence <id> = <expr> A <expr> + <expr> <id> <expr> * <expr> B <id> <id> <assign> A C <id> = <expr> A <expr> * <expr> <expr> + <expr> <id> <id> <id> C B A

An Unambiguous Grammar <assign> <id> := <expr> <id> A | B | C <expr> <expr> + <term> | <term> <term> <term> * <factor> | <factor> <factor> (<expr>) | <id> Here, we force operator precedence by making a multiplication occur lower in the tree by deriving it through an additional rule ( ) having the highest precedence requires the most derivation to get to it Assume we wanted to add another operator, unary (as in -5), how would we add it? What about adding ** (exponent)?

Derivation and Parse Tree of the Unambiguous Grammar <assign> <assign> <id> = <expr> A = <expr> A = <expr> + <term> A = <term> + <term> A = <factor> + <term> A = <id> + <term> A = B + <term> A = B + <term> * <factor> A = B + <factor> * <factor> A = B + <id> * <factor> A = B + C * <factor> A = B + C * <id> A = B + C * A <id> = <expr> A <expr> + <term> <term> <term> * <factor> <factor> <factor> <id> <id> <id> A B C

Associativity We maintain operator precedence through additional rules whereby higher precedence operators appear after the application of more rules Should we worry about associativity? in math, A + B + C = C + A + B should we force them to generate the same parse tree? It doesn t seem worthwhile, and yet if A, B and C are floats instead of ints, then A + B + C may not equal C + A + B based on the amount of precision How do we enforce associativity? We will require that all rules in our BNF be left recursive for left associativity and right recursive for right associativity Left recursive means that a recursive non-terminal must appear to the left of any other non-terminals <expr> <expr> + <term> is left recursive and <factor> <expr> ** <factor> is right recursive

If-Then-Else An <if_stmt> should be an if or an if-else Here is a simplistic grammar rule <if_stmt> if <bool_expr> then <stmt> | if <bool_expr> then <stmt> else <stmt> We have a problem in that <stmt> can itself be an if-else statement which can lead to ambiguity Look at the following statement generated from the above rule, is the else X := Y clause associated with the first condition or the second? if X > 0 then if Y > 0 then X:=0 else X:=Y this is known as the dangling else problem

Two Interpretations Because there are two parse trees, our grammar rule is ambiguous

Solution We need a more clever grammar rule We create higher level categories called <matched> and <unmatched> an if statement can be unmatched an if-else statement must be matched, or its else clause unmatched forces the else clause to be associated with the most recent condition we can also resolve this problem by using blocks ({ }, begin/end) <stmt> <matched> | <unmatched> <matched> if <bool_expr> then <matched> else <matched> | any non-if statement <unmatched> if <bool_expr> then <stmt> | if <bool_expr> then <matched> else <unmatched>

Extended BNF Grammars 3 common extensions to BNF: [ ] used to denote optional elements (saves some space so that we don t have to enumerate options as separate possibilities) { } used to indicate 0 or more instances ( ) for a list of choices These extensions are added to a BNF Grammar for convenience allowing us to shorten the grammar We do not need to use EBNF, but using it will shorten our grammars (see the example on the next slide)

Comparing BNF and EBNF BNF: <expr> <expr> + <term> | <expr> - <term> | <term> <term> <term> * <factor> | <term> / <factor> | <factor> <factor> <exp> ** <factor> | <exp> <exp> (<expr>) | <id> EBNF: <expr> <expr> {(+ | -) <term>} <term> <term> {(* | /) <factor>} <factor> <exp> { ** <factor>} <exp> (<expr>) | <id>

Attribute Grammars A BNF grammar can define the syntax of a language but it cannot ensure that all grammar rules are followed in the language the BNF grammar lacks static semantics Example language rules that cannot be handled by BNF include matching the number of and type of parameters between a function call and the function header making sure the type of an expression on the RHS of an assignment statement is compatible with the variable on the LHS ensuring identifier names are not reserved words Attribute grammars add to BNF grammars rules

Attribute Grammar Components Synthesized attributes information derived lower in the tree and passed up to be used in rules Inherited attributes information passed down the tree Semantic functions rules associated with grammar rules that compare synthesized attributes to inherited attributes to ensure correctness if a rule fails, we have a syntax error that differs from not following the grammatical part of a rule Intrinsic attributes leaf node attributes that may be computed during the derivation of a grammar rule or by a database lookup (for instance, the length in characters of an identifier)

Example: Identifier Length Some languages restricted identifier length Here, we define attribute grammar rules to ensure that an identifier is no more than 31 characters in length first, we have our grammar rule <identifier> <id><identifier> | <id> we have a synthesized value, length, which we either initialize to 1 (if <id> is used) or add 1 to the inherited value <identifier>.length = 1 <identifier>.length <identifer>.length + 1 we define a predicate that captures our syntactic rule of legal lengths for identifiers Predicate: <identifer>.length <= 31 finally, we define our rule for <id> which does not have an attribute <id> _ | <letter> | <digit> | _<id>

Example: Assignment State Type Matching <assign> <var> = <expr> Predicate: <expr>.expected_type <var>.actual_type <expr> <var>[2] + <var>[3] Attribute: <expr>.actual_type int (if both vars are ints), real otherwise Predicate: <expr>.actual_type = <expr>.expected_type <expr> <var> Attribute: <expr>.actual_type <var>.actual_type Predicate: <expr>.actual_type = <expr>.expected_type <var> A | B | C Attribute: <var.actual_type> look- up(<var>.string)

Example <assign> <expr> expected_type Assume A is a float and B is an int: <var> actual_type actual_type actual_type <var>.actual_type = float <var>[2].actual_type = float <var>[3].actual_type = int <expr>.actual_type = float (derived from var[2] and var[3] through semantic rule) <expr>.expected_type = float (inherited from <assign> which is inherited from <var>) <var>[2] <var>[3] A = A + B <expr>.expected_type inherited from parent <var>[1].actual_type lookup (A) <var>[2].actual_type lookup (B) <var>[1].actual_type =? <var>[2].actual_type <expr>.expected_type = <expr>.actual_type, so predicate is satisifed, no syntax error <expr>.actual_type <var>[1].actual_type <expr>.actual_type =? <expr>.expected_type

Describing Semantics There are several ways to formally describe the meaning of an instruction given the method, we generate semantic rules for each grammatical rule we can then use these to generate the semantics of specific instructions of a program Approaches English (whether program comments, pseudocode, etc) unfortunately, such an approach is vague, informal and imprecise because natural languages are both ambiguous and cannot be automatically generated Operational Semantics how the statements are executed Axiomatic Semantics what results to expect Denotational Semantics functional mapping for how the statement affects the program

Example Consider the code below where we are looking for the minimum of the 3 vars if(a<b && b<c) min = a; else if(b<a && b<c) min = b; else min = c; Let s assume we define this as minimum this is an English term and so is meaningless to a computer we can define the operationality of this code by stating what changes occur when executing it we can define the expectations of what will happen when the code terminates ultimately, we will find all 3 semantic approaches to be somewhat unsatisfactory

Operational Semantics What happens in the machine when the code is executed? Think of the operational semantics as a run-trace through the program by indicating the changes to the state we can implement this through an interpreter, compiler or assembler this will show us how the given instruction is operationalized This is a simple mechanistic description of the statement it does not help us understand the statement This is the easiest form of semantics but also the least useful we do discover anything inherent about the meaning just how it will execute

Example: C For-Loop for(expr1; expr2; expr3) stmt; We are literally showing how the C for-loop operates expr1; loop: if expr2 = 0 goto out stmt; expr 3; goto loop out:

Axiomatic Semantics Used mainly to prove correctness of code each statement in the language has associated assertions what we expect to be true before and after the statement executes we list these assertions as pre- and post-conditions so that given the state prior to the instruction, we can prove what must be true after the instruction executes Basic form of an axiomatic semantic is {P} S {Q} means if P is true before S, then Q is true after S We define how to determine Q given P and S

Finding Weakest Pre-condition Rather than starting with a pre-condition, we actually work our way backward what is the intent of the program? this tells us the expected post-condition work backward to derive the weakest pre-condition the weakest pre-condition is the most general, or the least restrictive Example assignment statement sum = 2*x+1; We expect sum > 1 after execution weakest precondition is then that x > 0 other pre-conditions exist such as x > 10, x > 50, x > 1000

Assignment Statement Rule Use the notation: {Qx E} x = E {Q} if Q is true after the assignment, then Qx E is true prior The notation Qx E means replace all instances of x in Q with E Example a=b/2-1; {a < 10} replace a in {a < 10} with b / 2 1 solve for b Qx E} is {b / 2 1 < 10} which is simplified to {b < 22} {b < 22} a = b / 2 1; {a < 10}

Additional Examples x = 2 * y 3; {x > 25} pre-condition is {2 * y 3 > 25} or {y > 14} c = d * e 4; {c > 0} pre-condition is {d * e 4 > 0} or {d * e > 4}, we can list this as {d > 4 / e} or {e > 4 / d} but in doing so, we can t allow d or e to be 0 so we should have {d > 4 / e & d != 0 & e != 0} a = b / c {a > 1} pre-condition is {b / c > 1} or { b > c & c != 0}

Sequences We will express a series of statements S1, S2, S3, ..., Sn as {P} S1, S2, S3, , Sn {Q} We now must find individual pre- and post-conditions between each pair of statements Thus, this becomes {P} S1 {Q1}; {Q1} S2 {Q2} ; {Q2} S3 {Q3};... {Qn} Sn {Q} To find {P}, we work backward derive Qn derive Qn-1 derive Q2 derive Q1 derive P

Examples y = 3 * x + 1; x = y + 3; if our post-condition is {x < 10} then our pre-condition between the two statements is {y+3 < 10} or {y < 7} our pre-condition before the first statement is {3 * x + 1 < 7} or {x < 2} if x < 2 before the first statement, then x < 10 after the second statement y = 3 / b; x = y + 1; c = x * 3; {c > 30} third precondition (before c = ) is x * 3 > 30 or x > 10 second precondition (before x = ) is y + 1 > 10, or y > 9 first precondition is 3 / b > 9, or b < 1 / 3 & b != 0 see the note in the notes page

Rule of Consequence The rule of consequence is shown as follows: {Q}, S P} { = = P' P, Q Q' {P' S } {Q' } => means implies If P implies P and Q implies Q and we have already proven that {P} S {Q} is true, then we can infer {P } S {Q } is also true notice that P => P means that P is a weaker condition than P whereas Q => Q means that Q is a stronger condition than Q this allows us to take a proven rule and weaken its postcondition and strengthen its precondition = = 3), = { x 3} x x 3 - {x 0}, (x 5) = (x (x 0) (x 0) 0} {x 5} x x { 3 - x

Selection Axiomatic Semantic Given a statement: if (B) S1; else S2; B is a boolean condition if B is true, execute S1 else execute S2 The semantic rule is {B & P} S1 {Q}, {(!B) & P} S2 {Q} This says that for Q to be our post-condition, we have two possible preconditions B & P !B & P We need to derive P (the same P) so that Q is true if B is true and we execute S1, and so that Q is true if !B is true and we execute S2

Example if (x > 0) y--; else y++; Suppose the post-condition is {y > 0} the pre-condition for the if-clause is {y > 1} the pre-condition for the else-clause is {y > -1} the condition {y > 1} is subsumed by the condition {y > -1} (that is, if {y > 1} is true, then {y > -1} must also be true) So, we select {y > 1} as our weakest pre-condition we cannot use {y > -1} because, if x > 0 and y = -1, our post-condition is not true

Logical Pretest Loops Here, we are talking about a while loop, which will look like this while (B) S; Let s assume {Q} is true afterward, what does that mean about {P}? as we do not know how many times S may execute (0 or more), we have to write P in such a way that it is true for all possible cases also, P must be true whether B is true or not The catch with P is that P must continue to be true between loop iterations as well since {P} is true prior to each execution of S {P} should only become untrue after the loop terminates

Loop Invariant To derive P, we create I, a loop invariant the invariant will always be true both before and after each S executes until B is no longer true The pre-condition must include {I} and the post- condition must include {I and Not B} determining a loop invariant is difficult and does not necessarily seem to help us understand the loop so we largely skip over the details here, see the notes page for some examples The rule of consequence for the while loop is shown below

Program Proofs Finding an invariant is not necessarily easy the invariant must include the loop s terminating condition but also be weak enough to describe what happens during each loop iteration in using axiomatic semantics for a loop, the requirement that the invariant include the terminating condition is often ignored in such a case, the axiomatic description is known as offering only partial correctness rather than total correctness if the terminating condition is met By combining these axiomatic rules, we can prove the correctness of an entire program consider the example on the next slide which swaps two variable values our precondition requires that the two variable values have in fact been swapped, now we will prove it

Examples {x = A AND y = B} t = x; x = y; y = t; {x = B AND y = A} {P} t = x; x = y; y = t; {x = B AND y = A} {P} t = x; {P1}, {P1} x = y; {P2}, {P2} y = t; {x = B AND y = A} {P2} is {x = B AND t = A} {P1} is {y = B AND t = A} {P} is {y = B AND x = A} and {y = B AND x = A} => {x = A AND y = B} The chapter offers a more complete example if you are interested

Denotational Semantics This form of semantics is a more rigorous method of describing the meaning of a program than our previous approaches denotational semantics is based on recursive function theory we derive a recursive function that defines the affects of an instruction on the state of the program the function maps from an instance of a mathematical object (the state of the machine) onto another mathematic object Example: rule for generating a binary number <bin_num> 0 | 1 | <bin_num>0 | <bin_num>1 Mbin(<bin_num>) = Mbin(0) = 0 Mbin(1) = 1 Mbin(<bin_num>0)=2*Mbin(<bin_num>) Mbin(<bin_num>1)=2*Mbin(<bin_num>) + 1 See the notes page for an example of using this function

Additional Functions Expressions Me(E, s) if VARMAP(i, s) = undef for some i in E then error else E , where E is the result of evaluating evaluating E after setting each variable i in E to VARMAP(i, s) Assignment Statements Ma(x := E, s) if Me(E, s) = error then error else s = {<i1 ,v1 >,<i2 ,v2 >,...,<in ,vn >}, where for j = 1, 2, ..., n, vj = VARMAP(ij, s) if ij <> x = Me(E, s) if ij = x See the notes page for some explanation Ml(while B do L, s) = if Mb(B, s) == undef then error else if Mb(B, s) == false then s else if Msl(L, s) == error then error else Ml(while B do L, Msl(L, s))

Syntax and Semantics

Download Presentation

Presentation Transcript

Related

More Related Content