R. E. Newman-Wolfe, University of Florida
Last modified 2/6/95.
Hints on working with formal languages, grammars, expressions, automata, etc.
This list is being constructed for the class named above, but some items
apply to a much more general set of endeavors. Please provide feedback
and add to this list if you will.
- Always try to work an exercise before getting help on it, but do
not hesitate to get help even if you believe you have solved it
successfully. Be prepared to give help to others, including
offering a critique of their proofs and solutions. This hones
your analytical skills, and helps you to obtain a deeper
understanding of the material.
- For each proof, clearly (and formally) state what it is you are
attempting to prove. The formal statement often gives some
clue as to how to approach the problem, and is in some cases
half the problem itself. Verify that you have done what you
set out to do when you think you are done.
- When you are creating a proof, see if you need all the power you
bring to it initially. For example, you may attempt to prove
something using strong induction when only weak induction is
needed, or perhaps you don't need induction at all. In any
case, reduce the power if possible.
- When devising an inductive proof, make sure that your base case and
your inductive hypothesis cover what you need when proving the
inductive step. Sometimes you will need to have more than one
base case for the induction to work.
- Use previous results if at all possible in your proofs. This helps
reveal the utility of the earlier results and may increase your
understanding of them, as well as shorten your proof. Be sure that
they apply before you use them, though. In particular, for this
class, make a distinction between languages, expressions, grammars,
and machines and the theorems that address each.
- If you attempt to prove some property P for some set A, that is,
you wish to prove P(A) is true, then don't waste your time
proving anything about some other set B. In particular, showing
that P(A') is false for the complement of A gets you nowhere.
- When constructing an FSA or a regular expression for a language,
test it out on several positive and negative example strings.
In fact, it is best if you sketch a proof that it does what
you want in your head, but almost all of the incorrect
constructions I see have short counterexamples.
- When dealing with a function, be sure you know its domain and range;
writing f:A -> B is a good idea to make that manifest. Then make
sure you understand the properties f has, or should have.
- If defining a function on a language, you need to describe what happens
to strings in the language; your base will be e and a in S, your
recursion will be on x= wa where x is in S* and a is in S.
- When proving something about a language inductively, you may go to the
machine representation, the grammar representation or any equivalent.
If you stay in the world of languages, then your induction will
usually be on the length of w, where w is a string in L. The base
will be for e and for a in S, and the inductive step will break the
longer string (about which you wish to prove the property) into two
or more shorter strings. Usually to prove something about a whole
class of languages, you will need to go to another representation.
- If defining a function on a string, you need to describe what happens
to a generic string. This is similar to the case for languages,
except that the language is S*.
- If defining a function on a regular expression, you need to describe
what happens to a regular expression; your bases will be 0, e, and
a in S, your recursion will be for r.s, r+s and for r*, where r and
s are regular expressions.
- When proving something about regular expressions inductively, then your
induction will usually be on the number of operators in the regular
expression, |r|. The base case, where |r| = 0, covers the three
bases for building regular expressions, r=0, r=e and r=a for a in S.
The inductive step will break a regular expression r, |r| > 0, into
components with an operator, r=s*, r=s+t, r=s.t. Here, you must use
strong induction since except for the case of r=s*, you do not know
what |s| and |t| are.
- If defining a function on a machine, you should start with the formal
tuples for the starting machine(s), then construct the new machine(s)
using the state(s), alphabet(s), starting state(s), transition
function(s) and accepting set(s) as building blocks. To then reason
about the behavior of the new machine(s) (as well as the old one(s)),
some way of describing their computation is needed. For FSAs this
takes the form of the extensions to the transition function. For
2DFSAs, PDAs, TMs and other machines, this includeds instantaneous
descriptions, which describe the input state (read head(s) position(s)
and input tape contents), the finite state control state, and the
internal storage state. If the machine has output, this may also be
included. The transition function is used to create the "turnstile"
function that describes how the machine gets from one ID to the next,
and the extensions to the turnstile function are used to reason about
the machine's behavior.
- When proving something about a machine inductively, your induction may
be on the number of IDs through which the machine has passed, or
the length of the input string, or the amount of storage or time
used. What you use and the appropriate base cases and inductive
steps will depend on the machine type, and what you want to show.
- When discovering a regular expression that corresponds to the language
accepted by an FSA (or NFSA, etc.), I find it useful to do a the
following things.
- First, build the languages for reaching each state from
q_0 recursively, starting with q_0 itself. This means getting to
each predecessor state, taking the step from the predecessor state
to the desired state, then traversing the loops that get the machine
back to that state without going through any of the earlier states.
This is equivalent to saying that the machine may enter the state
several times, but at some point it enters the earlier states for
the last time, then enters this state. All remaining computation is
carried out in the higher numbered states. (Of course, this in turn
means that the states should numbers consistent with the order in
which you may encounter them, starting from q_0).
- Second, look at the structure of the machine. This helps with
arriving at an English description of the language it recognizes as
well, and may help you minimize (or at least reduce) the regular
expression for it. If the machine has a "skeleton", a sequence of
states through which it progresses only on a particular string, then
this string is likely to be of some significance. For an FSA, if
there are no "failure" arcs back to the starting or intermediate
states, but they all go to a "dead" state, then suspect that this is
a language looking for a specific prefix. If there are no failure
states, but an accepting state that is impossible to leave, then
suspect a language looking for a substring. If the accepting state
has arcs back into the skeleton, then suspect a language looking
for a specific suffix. In both of these latter cases, you can
partially confirm this by checking for "failure" arcs that go to
sensible places in the skeleton, that is if there is a suffix of the
pattern seen so far (including the label on the failure arc) that is
a prefix of the skeleton pattern, and the arc goes to that prefix
state, then this is supporting evidence.
- Try to label the states with meaningful labels (not just q_0,
q_1, etc., but e, 1*, abbaa*, etc.). Remember that FSAs must be
able to distinguish strings that are distinguishable with respect to
the language they recognize, and this is done with states. Try to
label each state with a canonical member of the equivalence class of
strings that it "represents."
- It is OK to draw pictures to get an intuition, but do not let
the picture constrain your thinking - always check whether that is
all that can happen. The best way to do this is through proof that
the only kind of thing that can happen is what you have drawn. If
you have trouble proving that, perhaps there are other things that
can happen, and you should account for them.
- When trying to find a description of the language corresponding to a
regular expression, look for clues as to what is important and what
is not. Where there are *'s, this usually indicates less important
things. Look for length constraints, content constraints, etc.
Generate a few small strings, and try to characterize them, then
check to see if that works or generalizes.
This document is
copyright 1995
by Richard E. Newman-Wolfe.
Send comments to nemo@cis.ufl.edu