04.Syntax Analysis - polimi-notes

> Humans can tolerate ambiguity, computers and compilers no. ## Explained to a child And this is what actually happen in reality and in our exercises. If a grammar which is not LR is parsed by a LR tool this two things can happen in practice: - reduce reduce conflicts: basically the compiler gets confused because it sees two possible ways to group things together - shift reduce conflicts: the compiler this time gets confused because it doesn't know if it should take the next block and build with it right away, or if it should wait and try to group it with another block later. The typical example of this is the "dangling else problem" (it's an ambuiguity problem). Exists also LL, which is another way to parse the input. LL parsers use a top-down parsing approach and have the property of not allowing left-recursion in their grammar. Left-recursion would cause the parser to enter into an infinite loop, as it would continually expand the same non-terminal symbol without making progress. LL parsers are also less expressive than LR parsers. The main goal of a syntax analyzer or parser is to determine if a source string is in the language $L(G)$ of a given grammar $G$, and if so, to compute a derivation or syntax tree. We know the same (syntax) tree corresponds to many derivations, notably the leftmost and rightmost ones, as well as to less relevant others that proceed in a zigzag order. Two important parser classes: - Top-down analysis (LL(k) or predictive): Constructs leftmost derivation by starting from root and growing tree towards leaves. Each step is a derivation. - Bottom-up analysis (LR(k) or shift–reduce) : Constructs rightmost derivation in reverse order, from leaves to root. Each step is a reduction. If the string is not in the language, the parser stops and prints an error message. ## Bottom-Up Deterministic Analysis ELR(1) We start to consider EBNF grammars and ELR(1) (The acronym means Extended, Left to right, Leftmost, with length of look-ahead equal to one). ELR(1) parsers implement deterministic automaton with pushdown stack and internal states (called macro-states). - Candidate is a way that ELR uses to verify the match is correct. A pair `(state, token)` is named a candidate (also known as item). - Macro-states consist of a set of candidates. - A closure is the set of all candidates that can be find starting from the initial state and ''looking-ahead''. To calculate the candidates for a given grammar or machine net, we use a function traditionally named closure. which says that the end-of-text character is expected when the entire input is reduced to the axiom - A state can have more candidates. ILR looks multiple candidates in parallel. $C=<q,r>$ where $q$ is a state . A closure is all possible candidates of a given states. - Automaton performs two types of moves: shift and reduce. - Shift move: Reads incoming character, computes next macro-state, pushes token and next macro-state onto stack. - Reduce move: Occurs when top-most stack symbols match recognizing path in machine and current token is in current look-ahead set. At this point, it expands the syntax tree, updates stack by popping matched top-most part, pushing recognized nonterminal symbol and next macro-state. In the exercises we will check the determinism of all this stuff checking conditions related to the shift and reduction movements (conflicts). Doing this means to check if the grammar can be recognized. ## Conflicts and condition for ELR(1) There are three problems that must be verified for determinism in the pilot graph: - Shift-reduce conflict: transition to another state with a character $x$ and there is also a final state with lookahead $x$ . "the pilot can't know if it has finished or not". - Reduce-reduce conflict: in every configuration, at most one reduction is possible so we have a conflict when a state has two or more moves with the same lookahead. - Convergence conflict: multiple transitions reach same state. If the determinism test is passed, the PDA can analyze the string deterministically. ## ELL $ELL(1)$ is a simple and flexible top-down parsing method for $ELR(1)$ grammars with additional conditions. A machine net $M$ (so the grammar) meets the $ELL(1)$ condition if the following three clauses are satisfied: 1) there are no left-recursive derivations 2) the net meets the $ELR(1)$ condition, i.e., it does not have either shift–reduce, or reduce–reduce, or convergence conflicts 3) the net has the single-transition property (STP) ### Using the guide-sets A descending parser is said to be $ELL(1)$ if **at every point** in the machine network where there are multiple possible paths to take, **it is possible to determine which path to choose by looking at the next character in the input**. This means that the guide sets (sets of characters that can appear next) are disjoint for the different paths at the junction. So it's possible to manually check the guide sets at the exam without the pilot graph and using the machine nets looking branching point on each state and checking that the look-ahead sets of the outgoing transitions are disjoint.