# Free grammars Regular expressions can provide lists but not nesting (recursion). A [generative grammar](../../../BSc(italian)/Algoritmi%20e%20Principi%20dell'Informatica/src/05.Grammatiche%20Formali.md) is a set of simple rules that can be repeatedly applied to generate valid strings. The grammar defines languages through rule rewriting and the repeated application of these rules. A context-free grammar, also simply known as a free grammar or a type 2 or BNF (Backus Normal Form) grammar is defined as: $G =<V,\Sigma,P,S>$ where: - $V$ is the set of non terminals symbols - $\Sigma$ is the set of terminals symbols - $P$ is the set of productions (sometimes are indicated as $R$ (rules)) - $S$ is the axiom, a particular symbol of $V$ Note that the grammar is said to be "free" when the production rules apply equally to all occurrences of a nonterminal symbol in the grammar, without regard to the **context** in which it appears (indeed we can call them context-free). This distinguishes it from a context-sensitive grammar, where the production rules depend on the context in which the nonterminal appears. - Recursion is the key to create infinite grammars. The necessary and sufficient condition for a grammar to be infinite is that there is a recursive derivation. Note that a grammar have a recursive derivations iff the corresponding graph has a circuit. - A rule can be left recursive or right recursive or both - The concept of left recursive or right recursive is quite important in order to generate a left-associative operator for example. For a left-associative operator is necessary to use left-recursive rules. ```` A -> A terminal | X ```` - Clean grammars are grammars without a circular axioms. Classificazione regole importanti Given a rule $A \rightarrow \alpha$ with $A \in V$ ($V$ are non-terminal symbols set) and $\alpha \in (V \cup \Sigma)^*$ (where $\Sigma$ is the set of terminal symbols of the grammar) we can classify the production itself according to many definitions. **Some** of the most important are: - Recursive rule on the left: the production is called a recursive rule on the left if its left part appears as a prefix of the right part $A \rightarrow A \delta$ - Recursive rule on the right The production is called a recursive rule on the right if its left part appears as a suffix of the right part $A \rightarrow \delta A$ There are many normal form; one of the most important is indispensable for the top-down parsers to be studied later in the course. This normal form is termed not left-recursive and it is characterized by the absence of left-recursive rules or derivations. ### Syntax tree Given a grammar $G$ and a string in $L(G)$ we can build a syntax tree of the string where the root is the axiom $S$, the leafs are the sequence of the symbols of the string and all the branches are the productions used to generate the string. Two possible algorithms to build syntax trees: - ELL: top-bottom - ELR: bottom-up ## Ambiguity With the definition of Syntax Tree we can also specify what is an ambiguity in the context of grammars. A sentence is ambiguous if it admits different syntax trees. The number of distinct trees is also the degree of the ambiguity. > Avoid the ambiguity in the grammar design phase. ## Grammar classification Chomsky proposed a classification of grammars, thus establishing an order based on their generality and defining the "types" from 0 to 3, where type 3 represents the least general grammars and type 0 is the most general case. In this course, we will only be dealing with type 2 and 3 grammars. Each type of grammar is a subset of the one which precedes it. ### Type 0 grammars - Family of recursively enumerable languages - Associated with Turing machines and generates recursively enumerable languages. - The languages generated are not generally decidable. - Rules are of the most general type. ### Type 1 grammars: - Family of context-sensitive languages - Associated with Turing machines with tape of length equal to that of the string to be recognized. - Generates context-sensitive languages, which are always decidable. ### Type 2 grammars - Family of unrestricted languages - Associated with non-deterministic stack automata. - Generates unrestricted languages, which are always decidable and which we have already extensively analyzed. ### Type 3 grammars - Family of regular languages - Associated with finite-state automata. - Generates regular languages. - We can classify them to unilinear right or left |Grammar | Rule type | Language Family| Recognizer Model | |:---:|:---:|:---:|:---:| | Type 2 | $A \rightarrow \alpha$ with $A$ non-terminal and $\alpha$ can be anything (non-terminal/terminal) |context-free lang / BNF lang| Pushdown Automaton (non-deterministic) | | Type 3 | $A \rightarrow uB$ or $A \rightarrow Bu$ (not both) with $A$ non-terminal and B nonterm or $\epsilon$ | Regular or rational lang| Finite state automaton | ### EBNF Extended Backus Normal Form (EBNF) is just a variant grammar that contains exactly one rule (i.e. one rule for each nonterminal); each rule has a different nonterminal on the left side and has a regular expression of alphabet on the right side, in which derived operators such as cross, power and optionality can also appear. ```` EXPR -> IMP [imp IMP] IMP -> TRM (or TRM )* TRM -> FCT (and FCT)* FCT -> [not] EQU EQU -> OBJ ['=' OBJ] OBJ -> "x" | "y" | "z" | "(" EXPR ")" ```` ### Dyck Language Model of languages well-parenthesized: $\begin{aligned} & \Sigma=\{a, a^{\prime}, b, b^{\prime}\} \\ & S \rightarrow a S a^{\prime} S\left|b S b^{\prime} S\right| \varepsilon \end{aligned}$ Dyck language is free but not regular. ## Linear language equations Linear languages can be represented as a set of linear equations. Every rule corresponds to a linear equation. The system of equations can be solved using substitution and the Arden Identity.