# Free grammars
Regular expressions can provide lists but not nesting (recursion).
A [generative grammar](../../../BSc(italian)/Algoritmi%20e%20Principi%20dell'Informatica/src/05.Grammatiche%20Formali.md) is a set of simple rules that can be repeatedly applied to generate valid strings. The grammar defines languages through rule rewriting and the repeated application of these rules. A context-free grammar, also simply known as a free grammar or a type 2 or BNF (Backus Normal Form) grammar is defined as:
$G =<V,\Sigma,P,S>$
where:
- $V$ is the set of non terminals symbols
- $\Sigma$ is the set of terminals symbols
- $P$ is the set of productions (sometimes are indicated as $R$ (rules))
- $S$ is the axiom, a particular symbol of $V$
Note that the grammar is said to be "free" when the production rules apply equally to all occurrences of a nonterminal symbol in the grammar, without regard to the **context** in which it appears (indeed we can call them context-free). This distinguishes it from a context-sensitive grammar, where the production rules depend on the context in which the nonterminal appears.
- Recursion is the key to create infinite grammars. The necessary and sufficient condition for a grammar to be infinite is that there is a recursive derivation. Note that a grammar have a recursive derivations iff the corresponding graph has a circuit.
- A rule can be left recursive or right recursive or both
- The concept of left recursive or right recursive is quite important in order to generate a left-associative operator for example. For a left-associative operator is necessary to use left-recursive rules.
````
A -> A terminal | X
````
- Clean grammars are grammars without a circular axioms.
Classificazione regole importanti
Given a rule $A \rightarrow \alpha$ with $A \in V$ ($V$ are non-terminal symbols set) and $\alpha \in (V \cup \Sigma)^*$ (where $\Sigma$ is the set of terminal symbols of the grammar) we can classify the production itself according to many definitions. **Some** of the most important are:
- Recursive rule on the left: the production is called a recursive rule on the left if its left part appears as a prefix of the right part $A \rightarrow A \delta$
- Recursive rule on the right The production is called a recursive rule on the right if its left part appears as a suffix of the right part $A \rightarrow \delta A$
There are many normal form; one of the most important is indispensable for the top-down parsers to be studied later in the course. This normal form is termed not left-recursive and it is characterized by the absence of left-recursive rules or derivations.
### Syntax tree
Given a grammar $G$ and a string in $L(G)$ we can build a syntax tree of the string where the root is the axiom $S$, the leafs are the sequence of the symbols of the string and all the branches are the productions used to generate the string.
Two possible algorithms to build syntax trees:
- ELL: top-bottom
- ELR: bottom-up
## Ambiguity
With the definition of Syntax Tree we can also specify what is an ambiguity in the context of grammars.
A sentence is ambiguous if it admits different syntax trees. The number of distinct trees is also the degree of the ambiguity.
> Avoid the ambiguity in the grammar design phase.
## Grammar classification
Chomsky proposed a classification of grammars, thus establishing an order based on their generality and defining the "types" from 0 to 3, where type 3 represents the least general grammars and type 0 is the most general case. In this course, we will only be dealing with type 2 and 3 grammars. Each type of grammar is a subset of the one which precedes it.
### Type 0 grammars
- Family of recursively enumerable languages
- Associated with Turing machines and generates recursively enumerable languages.
- The languages generated are not generally decidable.
- Rules are of the most general type.
### Type 1 grammars:
- Family of context-sensitive languages
- Associated with Turing machines with tape of length equal to that of the string to be recognized.
- Generates context-sensitive languages, which are always decidable.
### Type 2 grammars
- Family of unrestricted languages
- Associated with non-deterministic stack automata.
- Generates unrestricted languages, which are always decidable and which we have already extensively analyzed.
### Type 3 grammars
- Family of regular languages
- Associated with finite-state automata.
- Generates regular languages.
- We can classify them to unilinear right or left
|Grammar | Rule type | Language Family| Recognizer Model |
|:---:|:---:|:---:|:---:|
| Type 2 | $A \rightarrow \alpha$ with $A$ non-terminal and $\alpha$ can be anything (non-terminal/terminal) |context-free lang / BNF lang| Pushdown Automaton (non-deterministic) |
| Type 3 | $A \rightarrow uB$ or $A \rightarrow Bu$ (not both) with $A$ non-terminal and B nonterm or $\epsilon$ | Regular or rational lang| Finite state automaton |
### EBNF
Extended Backus Normal Form (EBNF) is just a variant grammar that contains exactly one rule (i.e. one rule for each nonterminal); each rule has a different nonterminal on the left side and has a regular expression of alphabet on the right side, in which derived operators such as cross, power and optionality can also appear.
````
EXPR -> IMP [imp IMP]
IMP -> TRM (or TRM )*
TRM -> FCT (and FCT)*
FCT -> [not] EQU
EQU -> OBJ ['=' OBJ]
OBJ -> "x" | "y" | "z" | "(" EXPR ")"
````
### Dyck Language
Model of languages well-parenthesized:
$\begin{aligned}
& \Sigma=\{a, a^{\prime}, b, b^{\prime}\} \\
& S \rightarrow a S a^{\prime} S\left|b S b^{\prime} S\right| \varepsilon
\end{aligned}$
Dyck language is free but not regular.
## Linear language equations
Linear languages can be represented as a set of linear equations. Every rule corresponds to a linear equation.
The system of equations can be solved using substitution and the Arden Identity.