Lecture Notes Alex Aiken CS 164, Fall '94 Lecture 10 ---------- I. Course Administration - Send me your name and SID or SS# if you are an extension student. - WAII is assigned today. II. Review from last time: - A handle is a production X -> A and position in a right-sentential form BAw such that S -*> BXw -> BAw via a rightmost derivation. - Handles always appear at the top of the stack in shift-reduce parsing. - In particular, the handle is never to the left of the rightmost non-terminal (can't be because the parser is tracing a rightmost derivation) - Therefore shift and reduce moves are sufficient (never need to move the "#" to the left). III. A digression: Conflicts - General shift-reduce strategy: If there is no handle on the stack, shift. If there is a handle, reduce. - But what if there is a choice? - if it is legal to either shift or reduce, then there is a shift-reduce conflict - if it is legal to reduce by two or more productions, then there is a reduce-reduce conflict. - Conflicts generally arise with ambiguous grammars. (All ambiguous grammars generate conflicts, but so do other kinds of grammars). For example: E -> E * E | E + E | id # id * id + id ... E * E # + id reduce E # + id shift E + # id shift ... E + E reduce E Alternatively, we can also: # id * id + id ... E * E # + id shift E * E + # id shift ... E * E + E reduce E * E reduce E In this situation we can either reduce by E * E or shift a + in second step shown. Which one is chosen determines the associativity of * and +. - One can always rewrite ambiguous grammars of this sort to encode precedence in the grammar. - However, such grammar hacking is tedious and results in an unnatural grammar that is hard to understand. - bison permits precedence and associativity declarations. For example, one can say that * has higher precedence than +. - The effect of precedence and associativity declarations is to cause bison to resolve conflicts in certain ways. For example, declaring * to have higher precedence than + causes the parser to reduce in the example above. - Be careful with precedence and associativity declarations. They are basically heuristics and, while they usually do the "right" thing, you can get into trouble. Make sure you understand precedence and associativity declarations in bison. IV. Detecting Handles - Our favorite grammar: E -> ( E ) | E' | E' + E E' -> int * E' | int - To implement shift-reduce parsing, we must detect handles. - Important: Just because there are symbols on the stack that can be reduced does not mean there is a handle. Reusing an example from Lecture 9, the string int # * int + int can be reduced by E' -> int to E' # * int + int. But such a reduction is not part of any rightmost derivation and therefore is not a handle. - Of course, the rhs of some production must appear at the top of the stack, but this is only a necessary and not sufficient for having a handle. - Put another way, whether a given stack has a handle can depend on the entire contents of the stack. - Fact: If a grammar is unambiguous, then every sentential form has a unique handle. For simplicity, we assume grammars are unambiguous. The book takes a more general approach. - Definition: The set of "viable prefixes" are all A such that A#w is a configuration of a shift-reduce parse and Aw is a right sentential form. Equivalently, the viable prefixes are the prefixes of (rightmost derived) sentential forms that don't extend beyond the right end of the handle. - Note: a viable prefix is "viable" because it can be extended by adding terminals to form a valid sentential form of some rightmost derivation. As long as there is a viable prefix, no parsing error has been detected. - Important Fact #3 about bottom-up parsing is: The set of viable prefixes is a regular language. - Fact #3 is completely non-obvious. We won't prove it formally, but instead show how to construct an automaton that accepts viable prefixes. - Definition: An item is a production of a grammar with a "." somewhere on the rhs. For example, E -> . ( E ) E -> (. E ) E -> ( E ) . are all items. For epsilon productions X -> \epsilon the only item is X -> . Items are also called LR(0) items. - Intuition: - The problem in recognizing viable prefixes is that the stack may have only bits and pieces of the rhs of productions (if it had the complete rhs we could reduce). - Part of the trick is that the bits and pieces---at least in accepting parses---are always prefixes of the rhs of productions. - Example: Consider the input ( int ). Then the stack ( E # ) is a state of the shift reduce parser. "( E" is a prefix of the rhs of E -> ( E ), which we will reduce after the next shift. - An item E -> ( E . ) says we have so far seen the prefix ( E of this rhs and hope to see ). - There is more to the story. If the stack does not have a prefix of just one rhs R1, then the top of the symbols above the prefix of R1 must be a prefix of some rhs R2 that will eventually be reduced to the remainder of R1. Recursively, there can be symbols above R2, and so on. - Example: consider ( id ) The stack ( int * # int ) is a state of the shift-reduce parser. ( is a prefix of the rhs of E -> ( E ), and int * is a prefix of the rhs of E' -> int * int. - The Basic Idea: Recognizing viable prefixes is equivalent to recognizing a sequence of prefixes of rhs's of productions, where each prefix in the sequence can eventually reduce to part of the missing suffix of its predecessor. - Algorithm: Given a grammar G, construct an NFA recognizing the viable prefixes of right sentential forms of G. (1) Add a dummy production S' -> S to G (2) The states of the NFA are the items of G with the extra prod. (3) For each item E -> A . X B and production X -> C, add an edge labeled \epsilon: E -> A . X B ===epsilon===> X -> . C (4) For each item E -> A.XB add an edge labelled X: E -> A . X B ===X===> A X . B (5) Every state is an accepting state. (6) Start state is S' -> . S - Example: The NFA for the example grammar above. Item Eps Transitions Other Transitions On S' -> . E E -> . ( E ) S' -> E . E E -> . E' E -> . E' + E E' -> . int * E' E' -> . int E -> . ( E ) E -> ( . E ) ( E -> . E' E' -> . int * E' E -> E' . E' E' -> . int E -> . E' + E E' -> . int * E' E -> E' . + E E' E' -> . int E' -> .int * E' E -> int . * E' int E' -> . int E -> int . int E -> ( . E ) E -> . ( E ) E -> ( E . ) E E -> . E' E -> . E' + E E' -> . int * E' E' -> . int E -> ( E . ) E -> ( E ) . ) E -> E' . + E E -> E' + . E + E -> E' + . E E -> . ( E ) E -> E' + E . E E -> . E' E -> . E' + E E' -> . int * E' E' -> . int E -> int . * E' E -> int * . E' * E -> int * . E' E' -> . int * E' E -> int * E' . E' E' -> . int E' -> . int E' -> int . int - We can translate this NFA into a DFA by taking the \epsilon-closure of the NFA states. State Items Transition On 1 S' -> . E 9 E E -> . ( E ) 2 ( E -> . E' + E 5 E' E -> . E' 3 int E' -> . int * E' E' -> . int 2 E -> ( . E ) 7 E E -> . ( E ) 2 ( E -> . E' + E 5 E' E -> . E' 3 int E' -> . int * E' E' -> . int 3 E' -> int . * E' 4 * E' -> int . 4 E' -> int * . E' 10 E' E' -> . int * E' 3 int E' -> . int 5 E -> E' . + E 6 + E -> E' . 6 E -> E' + . E 11 E E -> . ( E ) 2 ( E -> . E' + E 5 E' E -> . E' 3 int E' -> . int * E' E' -> . int 7 E -> ( E . ) 8 ) 8 E -> ( E ) . 9 S' -> E . 10 E' -> int * E' . 11 E -> E' + E . - Definition: An item X -> B . C is valid for a viable prefix AB if there is a rightmost derivation S' -*> AXw -> ABCw - Intuition: the valid items for A are the prefixes of productions we might see after A. - Fact: An item I is valid for a viable prefix A if the DFA recognizing viable prefixes terminates on input A in a state containing I. - Notes: - An item is generally valid for many prefixes. Example: The item E -> (. E ) is valid for prefixes ( (( (((... - The states of the DFA are called the "canonical collections of items" or the "canonical collections of LR(0) items" - The book gives another way of constructing LR(0) items. V SLR Parsing Example - A SLR (Simple LR) parsing example was given in class using the automaton above. The SLR parsing algorithm is given at the start of Lecture 11. - Consider int * int. Configuration DFA Halt State Action # int * int $ 1 shift int # * int $ 3 (* not in Follow(E')) shift (note we avoid mistake here) int * # int $ 4 shift int * int # $ 3 ($ is in Follow(E')) red E' -> int int * E' # $ 10 red E' -> int *E' E' # $ 6 red E -> E' -- accept --