R. E. Newman-Wolfe, University of Florida
Last modified 2/20/95.
Regular Languages
Regular Expressions
Regular languages are those languages that may be generated from a
regular expression.
Regular expressions and the
languages
they generate are defined by the following rules.
- 0 is a regular expression, generating the empty language 0 = {}.
- e (the empty string) is a regular expression,
generating the language {e}.
- For each symbol a in the alphabet A, a is a regular expression
generating the language {a}.
- For regular expressions r and s, corresponding to languages
L_r and L_s, each of the following is a regular expression,
given in order of precedence.
- (r*), generating (L_r)*
- (rs), generating L_r L_s
- (r+s), generating L_r U L_s
- Only those expressions that can be constructed by the rules
above are regular expressions.
If parentheses are absent, then operations are applied left to right in
order of precedence, so a+b*c+d = (a+((b*)c))+d.
A regular language has many regular expressions corresponding to it.
One way to show that two regular expressions, r and s, correspond to the
same language is to show that any string generated by r can be generated
by s, and that any string that can be generated by s can also be generated
by r. This shows both that L_r is contained in L_s and that L_s is
contained in L_r, so they must be equal. Another way is to develop
identities by which a regular expression may be rewritten. If by application
of the rewriting rules, two expressions may be made identical, then the two
expressions correspond to the same language. Some example identities are
given below.
- a*(a+e) = a*
- a*a* = a*
- a* + b* = b* + a*
- (a* + b*)* = (a+b)*
- (a*b*)* = (a+b)*
Closure
Closure properties allow us both to build a desired regular language and
to show that a given language is regular. Alternatively, if we can show
that by using only operations that preserve regularity, a language L in
question can be used to build a language known to be non-regular, then we
can conclude that L is non-regular.
Theorem
The class of regular languages is closed under union, concatenation and *.
Proof
Follows directly from the definitions of regular language
and regular expression.
Theorem
All finite languages are regular.
Proof
Induction on the number of elements in the language, and induction on the
length of the string in a single element language prove this.
Theorem
The class of regular languages is closed under intersection, complementation,
and difference.
Proof
The proofs of these statements is most easily accomplished by using
the corresponding
finite state machines
to build new FSAs that recognize the new languages.
For intersection, the new FSA for its states pairs of states from the original
two machines. Its start state is the pair of starts states, and a state is
final if it consists of a pair of final states from the original machines.
Complementation is simply the reassignment of final states, F', so that for
original final states F and state set Q, F' = Q-F, the complement of F in Q.
Closure under difference may be shown using the two previous results and by
noting that A - B = A ^ B'.
This document is
copyright 1995
by Richard E. Newman-Wolfe.
Send comments to nemo@cis.ufl.edu