Regular Languages

Regular Expressions

Regular languages are those languages that may be generated from a regular expression. Regular expressions and the languages they generate are defined by the following rules.

0 is a regular expression, generating the empty language 0 = {}.
e (the empty string) is a regular expression, generating the language {e}.
For each symbol a in the alphabet A, a is a regular expression generating the language {a}.
For regular expressions r and s, corresponding to languages L_r and L_s, each of the following is a regular expression, given in order of precedence.
- (r*), generating (L_r)*
- (rs), generating L_r L_s
- (r+s), generating L_r U L_s
Only those expressions that can be constructed by the rules above are regular expressions.

If parentheses are absent, then operations are applied left to right in order of precedence, so a+b*c+d = (a+((b*)c))+d.

A regular language has many regular expressions corresponding to it. One way to show that two regular expressions, r and s, correspond to the same language is to show that any string generated by r can be generated by s, and that any string that can be generated by s can also be generated by r. This shows both that L_r is contained in L_s and that L_s is contained in L_r, so they must be equal. Another way is to develop identities by which a regular expression may be rewritten. If by application of the rewriting rules, two expressions may be made identical, then the two expressions correspond to the same language. Some example identities are given below.

a*(a+e) = a*
a*a* = a*
a* + b* = b* + a*
(a* + b*)* = (a+b)*
(a*b*)* = (a+b)*

Closure

Closure properties allow us both to build a desired regular language and to show that a given language is regular. Alternatively, if we can show that by using only operations that preserve regularity, a language L in question can be used to build a language known to be non-regular, then we can conclude that L is non-regular.

Theorem

The class of regular languages is closed under union, concatenation and *.

Proof

Follows directly from the definitions of regular language and regular expression.

Theorem

All finite languages are regular.

Proof

Induction on the number of elements in the language, and induction on the length of the string in a single element language prove this.

Theorem

The class of regular languages is closed under intersection, complementation, and difference.

Proof

The proofs of these statements is most easily accomplished by using the corresponding finite state machines to build new FSAs that recognize the new languages. For intersection, the new FSA for its states pairs of states from the original two machines. Its start state is the pair of starts states, and a state is final if it consists of a pair of final states from the original machines. Complementation is simply the reassignment of final states, F', so that for original final states F and state set Q, F' = Q-F, the complement of F in Q. Closure under difference may be shown using the two previous results and by noting that A - B = A ^ B'.

Send comments to nemo@cis.ufl.edu