Last modified 2/21/95.
NN = N = 0 REPEAT N = NN NN = N U {A in V | P(A) ^ N* <> 0} UNTIL (N = NN) RETURN NThe algorithm adds variables that can derive a null string in one or more steps (depending on the iteration at which the variable is added) and halts when a fixed point is reached. We now use this to eliminate e-productions from a grammar.
P2 = P1 = P N = FindNull(G) REPEAT P1 = P2 FOREACH (A in V) P2(A) = P1(A) U {alpha1 alpha2 | (alpha1 B alpha2) in P1(A) and B in N} UNTIL (P1 = P2) FOREACH (A in V) P1(A) = P1(A) - {e,A} RETURN
Proof
That G1 has no e-productions is evident from the last step of Algorithm 11.1. Showing that L(G1) = L(G) - {e} requires some delicacy. It is possible but difficult to prove this using induction on the number of steps in the derivation. Using the depth of the derivation tree instead works much better. Also, while it is only necessary to reason about derivations that start with the start symbol and derive a terminal string, it is no harder to prove things about derivations from any symbol and this stronger result makes it easier to prove the final desired result. In this particular case, it is not necessary to deal derivations of arbitrary sentential forms, which saves a fair amount of work. Also note that when proving that a word from one language is in the other, it is not necessary to state the exact number of steps in the derivation for the second language, only the first (the one on which induction is applied). The idea of the proof is to induct on the height of the derivation tree for substrings of the desired word, building it bottom-up.
Claim 1: L(G) - {e} contained in L(G1)
We prove this by induction on the number of steps n in the derivation of a string in T* from a variable in V. This stronger form allows us to carry out the inductive step easily, and we can conclude that the result holds for derivation from S as a special case.
Base (n = 1)
If n=1, then A => x in T*-{e} in one step, so x in P(A). The only productions removed from P to produce P1 either involve a variable on the right-hand side, or the null string. Since x in T*, removal of the former cannot prevent G1 from producing s from A, and since x cannot be e, removal of the latter cannot prevent G1 from producing s from A either.
IH
Assume that K > 0 and that for any x in T*-{e} and variable A in V, (A =>K_G x) => (A =>*_G1 x), i.e., if A derives x in K steps in G, then A also derives x in G1.
IS
Let x in T*-{e} and variable A in V such that A =>K+1_G x. Consider the first production of this derivation, A =>_G X1X2...Xn, where Xi in (V U T) for i = 1,2,...,n. It must be the case that x=x1x2...xn, where xi in T*, and where Xi =>*_G xi if Xi in V. Since the whole derivation has but K+1 steps, and one step is needed to obtain the sentential form X1X2...Xn, if Xi in V, Xi =>ji_G xi (Xi derives xi in ji steps in G) with ji <= K. Thus, either the conditions for the IH hold, or Xi in T, or xi = e. If xi = e, then Xi is a nullable variable. Let Y1Y2...Ym be the sentential form X1X2...Xn with all such Xi that derive e in the derivation in G omitted. Then Y1Y2...Ym in P1(A) by construction of P1. For each Xi in V in the sentential form for which Xi =>ji_G xi in T+, Xi =>*_G1 xi as well by the IH. Thus A =>_G1 Y1Y2...Ym =>*_G1 x1x2...xn = x. Hence the IH holds for K+1 and is thus true for all K=1,2,.... In particular, if x in L(G) - {e}, then S =>n_G x for some n, so S =>*_G1 x, and x in L(G1).
QED Claim 1
Claim 2: L(G1) contained in L(G) - {e}
Again, we prove something stronger so that we can perform the inductive step.
Base (n = 1)
If n=1, then A => x in T+ in one step, so x in P1(A). By the construction of P1, if x=a1a2...an in T+, then there must be strings of nullable variables W0, W1,..., Wn such that alpha = W0.a1.W1.a2.W2...an.Wn in P(A). In that case, A =>_G alpha =>*_G a1a2...an = x by deriving e from each of the strings of nullable variables, Wi.
IH
Assume that K > 0 and that for any x in T+ and variable A in V, (A =>K_G1 x) => (A =>*_G x), i.e., if A derives x in K steps in G1, then A also derives x in G.
IS
Let x in T+ and variable A in V such that A =>K+1_G1 x. Consider the first production of this derivation, A =>_G1 X1X2...Xn, where Xi in (V U T) for i = 1,2,...,n. It must be the case that x=x1x2...xn, where xi in T+, and where Xi =>*_G1 xi if Xi in V or Xi = xi otherwise. Since the whole derivation has but K+1 steps, and one step is needed to obtain the sentential form X1X2...Xn, if Xi in V, Xi =>ji_G1 xi (Xi derives xi in ji steps in G1) with ji <= K. Thus, for each Xi, either the conditions for the IH hold, or Xi in T. For each Xi in V Xi =>*_G xi as well by the IH. For X1X2...Xn in P1(A), there must be strings of nullable variables W0, W1,..., Wn such that W0X1W1X2...XnWn in P(A) by construction of P1(A). Thus A =>_G W0X1W1X2...XnWn =>*_G ex1ex2e...exne = x. Hence the IH holds for K+1 and is thus true for all K=1,2,.... In particular, if x in L(G1), then S =>n_G1 x for some n, so S =>*_G x, and x in L(G1).
QED Claim 2
QED