Parsing To extract the grammatical structure of a program, where:
sentence = program
words = tokens For further information:
Aho, Sethi, Ullman, “Compilers: Principles, Techniques, and Tools” (a.k.a, the “Dragon Book”)
Outline of coverage Context-free grammars
Parsing
Tabular Parsing Methods
One pass
Top-down
Bottom-up
Yacc
Grammatical structure of program function-def name arguments stmt-list main stmt expression operator expression expression variable string cout << “hello, world\n” ()
Context-free languages Grammatical structure defined by context-free grammar
statement labeled-statement | expression-statement | compound-statementlabeled-statement ident : statement | case constant-expression : statementcompound-statement { declaration-list statement-list } terminal non-terminal “Context-free” = only one non-terminal in left-part
Parse trees Parse tree = tree labeled with grammar symbols, such that:
If node is labeled A, and its children are labeled x1...xn, then there is a production A x1...xn
“Parse tree from A” = root labeled with A
“Complete parse tree” = all leaves labeled with tokens
Parse trees and sentences Frontier of tree = labels on leaves (in left-to-right order)
Frontier of tree from S is a sentential form
Frontier of a complete tree from S is a sentence L E a L ; E “Frontier”
Example G: L L ; E | E E a | b
Syntax trees from start symbol (L): a a;E a;b;b L E a L E a L ; E L E a L ; E b L E b ; Sentential forms:
Derivations Alternate definition of sentence:
Given , in V*, say is a derivation step if ’’’ and = ’’’ , where A is a production
is a sentential form iff there exists a derivation (sequence of derivation steps) S ( alternatively, we say that S* ) Two definitions are equivalent, but note that there
are many derivations corresponding to each parse tree
Another example H: L E ; L | E E a | b L E a L E a L ; E L E a L ; E b L E b ;
Ambiguity A sentence can have more than one parse tree
A grammar is ambiguous if there is a sentence with more than one parse tree
Example 1
E E+E | E*E | id E E E E E id id id + * E E E E E id id id + *
Notes If e then if b then d else f
{ int x; y = 0; }
a.b.c = d;
Id -> s | s.id
E -> E + T -> E + T + T -> T + T + T -> id + T + T -> id + T * id + T -> id + id * id + T ->
id + id * id + id
Ambiguity Ambiguity is a function of the grammar rather than the language
Certain ambiguous grammars may have equivalent unambiguous ones
Grammar Transformations Grammars can be transformed without affecting the language generated
Three transformations are discussed next:
Eliminating Ambiguity
Eliminating Left Recursion (i.e. productions of the form AA )
Left Factoring
Eliminating Ambiguity Sometimes an ambiguous grammar can be rewritten to eliminate ambiguity
For example, the grammar of Example 1 can be written as follows:
E E +T | T
T T * id | id
The language generated by this grammar is the same as that generated by the previous grammar. Both generate id(+id|*id)*
However, this grammar is not ambiguous
Eliminating Ambiguity (Cont.) One advantage of this grammar is that it represents the precedence between operators. In the parsing tree, products appear nested within additions E T T E id + * id T id
Eliminating Ambiguity (Cont.) An example of ambiguity in a programming language is the dangling else
Consider
S if b then S else S | if b then S | a
Eliminating Ambiguity (Cont.) When there are two nested ifs and only one else.. S if b then S else S a if b then S a S if b then S if b then S else S a a
Eliminating Ambiguity (Cont.) In most languages (including C++ and Java), each else is assumed to belong to the nearest if that is not already matched by an else. This association is expressed in the following (unambiguous) grammar:
S Matched
| Unmatched
Matched if b then Matched else Matched
| a
Unmatched if b then S
| if b then Matched else Unmatched
Eliminating Ambiguity (Cont.) Ambiguity is a property of the grammar
It is undecidable whether a context free grammar is ambiguous
The proof is done by reduction to Post’s correspondence problem
Although there is no general algorithm, it is possible to isolate certain constructs in productions which lead to ambiguous grammars
Comments