Hints on working with formal languages, grammars, expressions, automata, etc.

This list is being constructed for the class named above, but some items apply to a much more general set of endeavors. Please provide feedback and add to this list if you will.

Always try to work an exercise before getting help on it, but do not hesitate to get help even if you believe you have solved it successfully. Be prepared to give help to others, including offering a critique of their proofs and solutions. This hones your analytical skills, and helps you to obtain a deeper understanding of the material.
For each proof, clearly (and formally) state what it is you are attempting to prove. The formal statement often gives some clue as to how to approach the problem, and is in some cases half the problem itself. Verify that you have done what you set out to do when you think you are done.
When you are creating a proof, see if you need all the power you bring to it initially. For example, you may attempt to prove something using strong induction when only weak induction is needed, or perhaps you don't need induction at all. In any case, reduce the power if possible.
When devising an inductive proof, make sure that your base case and your inductive hypothesis cover what you need when proving the inductive step. Sometimes you will need to have more than one base case for the induction to work.
Use previous results if at all possible in your proofs. This helps reveal the utility of the earlier results and may increase your understanding of them, as well as shorten your proof. Be sure that they apply before you use them, though. In particular, for this class, make a distinction between languages, expressions, grammars, and machines and the theorems that address each.
If you attempt to prove some property P for some set A, that is, you wish to prove P(A) is true, then don't waste your time proving anything about some other set B. In particular, showing that P(A') is false for the complement of A gets you nowhere.
When constructing an FSA or a regular expression for a language, test it out on several positive and negative example strings. In fact, it is best if you sketch a proof that it does what you want in your head, but almost all of the incorrect constructions I see have short counterexamples.
When dealing with a function, be sure you know its domain and range; writing f:A -> B is a good idea to make that manifest. Then make sure you understand the properties f has, or should have.
If defining a function on a language, you need to describe what happens to strings in the language; your base will be e and a in S, your recursion will be on x= wa where x is in S* and a is in S.
When proving something about a language inductively, you may go to the machine representation, the grammar representation or any equivalent. If you stay in the world of languages, then your induction will usually be on the length of w, where w is a string in L. The base will be for e and for a in S, and the inductive step will break the longer string (about which you wish to prove the property) into two or more shorter strings. Usually to prove something about a whole class of languages, you will need to go to another representation.
If defining a function on a string, you need to describe what happens to a generic string. This is similar to the case for languages, except that the language is S*.
If defining a function on a regular expression, you need to describe what happens to a regular expression; your bases will be 0, e, and a in S, your recursion will be for r.s, r+s and for r*, where r and s are regular expressions.
When proving something about regular expressions inductively, then your induction will usually be on the number of operators in the regular expression, |r|. The base case, where |r| = 0, covers the three bases for building regular expressions, r=0, r=e and r=a for a in S. The inductive step will break a regular expression r, |r| > 0, into components with an operator, r=s*, r=s+t, r=s.t. Here, you must use strong induction since except for the case of r=s*, you do not know what |s| and |t| are.
If defining a function on a machine, you should start with the formal tuples for the starting machine(s), then construct the new machine(s) using the state(s), alphabet(s), starting state(s), transition function(s) and accepting set(s) as building blocks. To then reason about the behavior of the new machine(s) (as well as the old one(s)), some way of describing their computation is needed. For FSAs this takes the form of the extensions to the transition function. For 2DFSAs, PDAs, TMs and other machines, this includeds instantaneous descriptions, which describe the input state (read head(s) position(s) and input tape contents), the finite state control state, and the internal storage state. If the machine has output, this may also be included. The transition function is used to create the "turnstile" function that describes how the machine gets from one ID to the next, and the extensions to the turnstile function are used to reason about the machine's behavior.
When proving something about a machine inductively, your induction may be on the number of IDs through which the machine has passed, or the length of the input string, or the amount of storage or time used. What you use and the appropriate base cases and inductive steps will depend on the machine type, and what you want to show.
When discovering a regular expression that corresponds to the language accepted by an FSA (or NFSA, etc.), I find it useful to do a the following things.
- First, build the languages for reaching each state from q_0 recursively, starting with q_0 itself. This means getting to each predecessor state, taking the step from the predecessor state to the desired state, then traversing the loops that get the machine back to that state without going through any of the earlier states. This is equivalent to saying that the machine may enter the state several times, but at some point it enters the earlier states for the last time, then enters this state. All remaining computation is carried out in the higher numbered states. (Of course, this in turn means that the states should numbers consistent with the order in which you may encounter them, starting from q_0).
- Second, look at the structure of the machine. This helps with arriving at an English description of the language it recognizes as well, and may help you minimize (or at least reduce) the regular expression for it. If the machine has a "skeleton", a sequence of states through which it progresses only on a particular string, then this string is likely to be of some significance. For an FSA, if there are no "failure" arcs back to the starting or intermediate states, but they all go to a "dead" state, then suspect that this is a language looking for a specific prefix. If there are no failure states, but an accepting state that is impossible to leave, then suspect a language looking for a substring. If the accepting state has arcs back into the skeleton, then suspect a language looking for a specific suffix. In both of these latter cases, you can partially confirm this by checking for "failure" arcs that go to sensible places in the skeleton, that is if there is a suffix of the pattern seen so far (including the label on the failure arc) that is a prefix of the skeleton pattern, and the arc goes to that prefix state, then this is supporting evidence.
- Try to label the states with meaningful labels (not just q_0, q_1, etc., but e, 1*, abbaa*, etc.). Remember that FSAs must be able to distinguish strings that are distinguishable with respect to the language they recognize, and this is done with states. Try to label each state with a canonical member of the equivalence class of strings that it "represents."
- It is OK to draw pictures to get an intuition, but do not let the picture constrain your thinking - always check whether that is all that can happen. The best way to do this is through proof that the only kind of thing that can happen is what you have drawn. If you have trouble proving that, perhaps there are other things that can happen, and you should account for them.
When trying to find a description of the language corresponding to a regular expression, look for clues as to what is important and what is not. Where there are *'s, this usually indicates less important things. Look for length constraints, content constraints, etc. Generate a few small strings, and try to characterize them, then check to see if that works or generalizes.

Send comments to nemo@cis.ufl.edu