Last modified 3/27/95.
Since there may be any (finite) number of such productions for each variable A in V, P maps V into 2^(V U T)*, subsets of (V U T)*. The function P is often thought of as a set of individual productions of the form A -> alpha, where A is in V and alpha is in (V U T)*. In that case, it is said that A -> alpha is a production of P.
It is useful to talk about the height of a derivation tree rooted at a variable A that derives a sentential form alpha. For this, we superscript the => symbol with the number of steps in parentheses:
NN = N = 0
REPEAT
N = NN
NN = N U {A in V | P(A) ^ N* <> 0}
UNTIL (N = NN)
RETURN N
The algorithm adds variables that can derive a null string in one or more
steps (depending on the iteration at which the variable is added) and halts
when a fixed point is reached. We now use this to eliminate e-productions
from a grammar.
This can be shown
graphically.
P2 = P1 = P
N = FindNull(G)
REPEAT
P1 = P2
FOREACH (A in V)
P2(A) = P1(A) U
{alpha1 alpha2 | (alpha1 B alpha2) in P1(A) and B in N}
UNTIL (P1 = P2)
FOREACH (A in V) P1(A) = P1(A) - {e,A}
RETURN
NAD = {B | B in P(A)}
REPEAT
AD = NAD
NAD = AD U {B in V | Exists C in AD s.t. B in P(C)} - {A}.
UNTIL (AD = NAD)
RETURN AD
The following algorithm assumes there are no e-productions in G.
FOREACH (B in V) AD(B) = Find-A-Deriv(B,G)
FOREACH (A in V, B in AD(A))
P1(A) = P(A) U P(B)
FOREACH (A in V)
P1(A) = P1(A) - AD(A)
RETURN
When P1(A) is first created,
it includes all the right-hand sides for every non-e-production for
any variable B that A can derive. Finally, all the unit productions
are removed. Note that, provided the appropriate minor modifications
are made, these two steps may be done in either order!
NL = {A | P(A) ^ T* <> 0 }
REPEAT
L = NL
NL = L U {B in V | P(B) ^ (T U L)* <> 0 }
UNTIL (L = NL)
RETURN L
On iteration i, the algorithm adds those variables that take at
least i steps to derive a string of terminals.
NR = {S}
REPEAT
R = NR
NR = R U {A in V | Exists B in R s.t. P(B) ^ (VUT)*A(VUT)* <> 0 }
UNTIL (R = NR)
RETURN R
On iteration i, the algorithm adds those variables that only appear in
sentential forms that take least i steps to derive from S.
V' = FindLive(G)
FOREACH (A in V1)
P'(A) = P(A) ^ (V' u T)*
G' =
V1 = FindReachable(G')
FOREACH (A in V1)
P1(A) = P(A) ^ (V1 u T)*
RETURN
Only reachable and live variables can be useful (although these alone are
not sufficient!), so the algorithm restricts attention to them. Only
productions that involve live and reachable variables can contribute to
L(G), so the rest are pruned. A variable may be both live and reachable,
but may appear only in sentential forms with non-live variables, and so be
useless. For this reason, it is important that the grammar G' with only
live variables be produced first, since the pruning operation on P will
eliminate all productions involving dead variables. If a live, reachable
variable can only appear in a sentential form with non-live variables, then
it will not be reachable in the pruned grammar, since the necessary step
that produced the non-live variable(s) with which it must appear is no
longer available.
Determine the pumping lemma constant N for G in GNF, then run all possible derivations with N to 2N steps. This will generate all strings in L with length n, N <= n <= 2N. L is infinite if and only if there is at least one such string.
Again, determine the pumping lemma constant N for G in GNF, then run all possible derivations of length up to N. L is empty if and only if there are no strings of length less than N.
For this, using a grammar in GNF suggests an algorithm that parses one input character at a time, and keeps track of all sentential forms that have a prefix of the input as a prefix of the sentential form consisting of all of the its terminal symbols.
The CYK algorithm takes a grammar in CNF, and using dynamic programming parses the string bottom-up. It may also be used to determine the number of distinct derivation trees in G for a particular string.