R. E. Newman-Wolfe, University of Florida
Last modified 2/20/95.
Introduction
We will study the relationships between languages, machines, and
grammars. A language is a set of strings over a finite alphabet,
where a string is the concatenation of zero or more symbols from
the alphabet. Machines will always include a means of reading
input from an input tape, one symbol at a time, and will contain
some amount of finite state control. Additional storage and types
of storage, additional heads and the ability to move the head(s)
in different ways provide for variations in the machine models,
and may allow machines to perform harder tasks or perform the same
tasks faster. Machines with output capabilities may also be considered
as generators of languages (they output exactly the strings of the
language delimited in some fashion) or computers (given an input
string, the machine may produce an output string and halt, or if the
function is not defined for that input string, it may not halt).
For the most part, we will consider machines as language recognizers,
that is, given an input string, the machine will execute for some number
of steps and halt in an accepting state or not (it may not halt, or it
may halt in a non-accepting state). Grammars consist of two sets of
symbols, terminals (corresponding to the language's alphabet) and
non-terminals (which act as variables), along with rules for how to
start and how to replace non-terminals with other strings. Exercising
a sequence of these production rules from the start symbol may produce
a string of terminals, in which case that string is said to be in the
language of that grammar and the sequence is said to be a derivation of
that string. Determining if a string is in the language generated by a
grammar consists of parsing the string, that is, finding a derivation
for the string. These three ways of defining languages (directly, or
by describing a recognizing machine or a generating grammar) are
closely related, and the nature of their relationships will constitute
a major portion of our study.
Additional properties are also of interest, and by establishing the
relationships between languages, machines and grammars, we obtain tools
to explore these other properties. Languages, machines and grammars may
be organized into classes depending upon properties that they have, and
in many cases, the classes of languages will correspond to classes of
machines and classes of grammars. Determining to which class a particular
language belongs is one type of question we will often ask. We will also
define operations on languages, by which one or more language may be
combined or altered in a systematic way to produce another language
(perhaps the same one). A particularly interesting set of questions have
to do with closure properties of classes of languages. A class of language
is closed under some operation if, given a language in the class,
application of the operation always produces another language (perhaps the
same one) within the same class. There are subtle but important differences
between questions about languages, machines, grammars and classes of these.
The student should gain an appreciation of these differences in addition
to acquiring techniques to answer the questions that arise.
Special questions of great importance include whether a language is
computable (that is, is there any machine that can recognize it among the
fleet of machine models we know), whether a language is recognizable by
a machine that always halts (decidability), and whether a language has an
``efficient'' machine (one that operates in polynomial time: tractability).
Once we have gained a solid foundation in the concepts of language theory,
we will explore the structure of classes of languages.
Basics
Below are some preliminary definitions provided to remind the student of
the basic mathematics we will need in our explorations. This summary will
be rather abstract - refer to the book for examples.
Sets
In what follows, I must apologize for overloading some symbols and classes
of symbols (a limitation of the fonts available).
We will always assume the existence of a universal set, U, relative to
which the complement operation will be defined. A set consists of
some number (zero, a finite number, a countably infinite number or an
uncountably infinite number) of elements. A set may be defined by
listing the elements (if it is finite), by describing some means of
generating or recognizing the elements, or by a ``set former''. The
latter is of the form
S = {x in A | Q(x)},
read as ``S consists of x in A such that Q of x is true (or simply, Q of x).''
Q(x) is a predicate that describes what must be true of x in order for it
to be in the set S. Standard sets of numbers we will use include R (the
real numbers), Z (the integers), N (the naturals, N={1,2,3,...}), W (the
whole numbers, W = {0,1,2,...}) and for each of these, S+ (the positive
members of the set indicated). Some set operations are defined below.
- Complement: S' = {x in U | x is not in S}
- Difference: A \ B = {x in U | x is in A and x is not in B}
- Union: A U B = {x in U | x is in A or x is in B}
- Intersection: A ^ B = {x in U | x is in A and x is in B}
- Disjoint: A is disjoint from B iff A ^ B = {} (the empty set)
- Power set: P(S) = { A | A is a subset of S}
- Product: A X B = {(x,y) | x is in A and y is in B}
Some properties of these operations are as follow.
- Commutative: A U B = B U A, A ^ B = B ^ A.
- Associative: (A U B) U C = A U (B U C),
(A ^ B) ^ C = A ^ (B ^ C)
- Distributive: A U (B ^ C) = (A U B) ^ (A U C),
A ^ (B U C) = (A ^ B) U (A ^ C).
- De Morgan's Laws: (A U B)' = A' ^ B', (A ^ B)' = A' U B'.
The size of a set S is indicated by |S| or by #S, and is the number of
elements in the set. If sets A and B are disjoint, then
- |A U B| = |A| + |B|
- |A ^ B| = 0
- |A X B| = |A| * |B|
- |P(A)| = exp(2,|A|)
Logic
Logic deals with statements and their truth sets. A statement may have
variables in it that may be bound or free. Binding follows scoping rules
much as it does in statically scoped programming languages, with the
addition of the two quantifiers, forall (upside-down A) and exists (backwards
E). The order of quantifiers matters a great deal: the statements S1 and
S2 are not at all the same. Let H = {all humans, living and dead} and let
m:H -> H be m(h) = the biological mother of h.
- S1: Forall x in H there exists y in H such that y=m(x).
- S2: There exists y in H forall x in H such that y=m(x).
S1 states that every human has a (unique) mother, while S2 states that
there is some poor human who is the mother of every human that ever lived,
herself included.
Statements have associated with them truth sets, for which the statement
is true, when an element from the truth set is bound to the statement's
variables. For S1, the truth set T(S1) = {(x,m(x)} | x is in H}. For
S2, the truth set is empty. A statement is said to be satisfied by an
assignment if the assignment is in the truth set, that is, if substitution
of the values in the assignment for the variables in the statement produces
a true statement.
One statement is said to imply another statement, written S1 => S2,
if T(S1) is contained in T(S2). Two statements are equivalent, written
S1 <=> S2, iff S1 => S2 and S2 => S1 (their truth sets are equal).
Operators on statements include AND (upside down V, for which I will use ^),
OR (V), and NOT. Thus P => Q is the same as (NOT P) V Q.
Relations
A binary relation R is just a set of ordered pairs, R contained in A X B.
Often A and B are the same set. We write a R b if and only if (iff) (a,b)
is in R. Some properties a relation may have are listed below.
- Reflexive: for all a, a R a.
- Antireflexive: for all a, NOT (a R a).
- Symmetric: for all a and b, a R b => b R a.
- Antisymmetric: for all a and b, a R b => NOT (b R a).
- Transitive: for all a, b, and c,
a R b and b R c => a R c.
- Equivalence relation: R is an equivalence relation if
it is reflexive, symmetric and transitive (RST).
Equivalence relations are especially important, and they partition a set
into disjoint parts. If a is an element of the base set A over which the
relation is defined, then [a]_R is the equivalence class of a under relation
R, [a]_R = {b in A | a R b}. We will drop the R subscript if the relation
is understood from context. For any a, b in A, either [a] = [b] or
[a] ^ [b] = 0.
Functions
A function is a special type of relation, one for which the second element
of each pair is uniquely determined by the first element. If f:A -> B
is a function from domain A into codomain B, then for all (a, b) in f and
(a', b') in f, if a = a' then b = b' (the second element is unique). If
(a,b) is in f, we may write f(a) = b. If for every element of A, there is
a b in B such that (a,b) is in f, then f is a total function, otherwise f
is a partial function (and there is at least one element of A for which f
is not defined). The range R of f, given a domain A, is the subset of the
codomain B of the images of the elements of A under f. That is, the range
R = {b in B | there is an a in A such that (a,b) is in f}.
Depending on the domain and the codomain, properties a function
f:A -> B may have include the following.
- Partial: if there is some a in A such that there is no
b in B such that (a,b) is in f (i.e., f is not defined at a).
- Total: if f is defined on all of A (is not partial).
- Injection (one-to-one): f:A >-> B iff for all (a,b) and (a',b')
in f, b = b' => a = a', that is, the for every element of B
there is at most one pre-image in A.
- Many-to-one: f:A -> B is many-to-one if there exist a, a' in A,
a <> a', and f(a) = f(a'), that is, there are at least two
distinct elements of the domain that have the same image
under f.
- Surjection (onto): f:A ->> B iff for all b in B there is an a
in A such that f(a) = b, that is, the codomain is the range.
- Bijection: f:A >->> B iff f is both one-to-one and onto.
Bijections are important because they are invertible (f inverse is a
function and f inverse inverse is f), and they compose (the composition of
two bijections is also a bijection). It cannot be stressed too strongly
that these properties are relative to the domain and codomain defined.
Languages
Languages are sets of strings over a finite alphabet. An alphabet A is
just a set of distinct symbols, A = {a_1, a_2, ... a_k}, and a string is
the concatenation of zero or more symbols. The length of a string s is
the number of symbols in the string, |s| = k if s = a_i1 a_i2 ... a_ik.
Formally, a string of length k may be thought of as a function from
{1, 2, ..., k} into A. The null string, denoted by e here and by the
Greek letter epsilon or lambda elsewhere, is the string of length zero,
or the null function (an empty set of ordered pairs).
A* = {s | s is a string over A} is the set of all strings over alphabet A.
- A* = {s = a_1 a_2 ... a_k | k >= 0 and for all i, 0
A language L over alphabet A is just a subset of A*, L contained in P(A*).
Two important language operations are defined below.
- Concatenation:
L_1 L_2 = {s = s_1 s_2 | s_1 is in L_1 and s_2 is in L_2}.
- Kleene closure:
L* = {s = s_1 s_2 ... s_k | k >= 0 and for all i, 0
The Kleene closure may be defined formally by building on sets L^k defined
in a manner similar to that for strings, as a function from {1,2,...,k}
into L.
- L^k = {s = s_1 s_2 ... s_k | for all i, 0 L* = L^0 U L^1 U ... (Union over all k in W of L^k).
Note that the Kleene closure of a language always contains the null string,
e, since e as a string corresponds to the null function, which is the only
function in L^0, when it is considered as the set of all functions from
{} into L. This is similar to the notion of the power set of any set always
including the empty set. In particular, note that 0* = { e }.
Proofs
A proof is in essence a social concept, it is an argument or explanation
that is sufficiently detailed and logical that other smart people are
convinced of the truth of the statement that is to be proven. Proofs may
be best written in an hierarchical fashion, this providing an outline
of the major steps, with substeps within each step justifying that step.
A step that is obvious to someone may be scanned, while one that is not
obvious may be expanded and investigated in greater detail. Formal proof
systems rely on axioms (statements taken to be true without proof) and
rules that allow one or more statements to produce new statements. We will
pursue proofs here that are informal (and I hope, not tedious) but
sufficiently detailed that there are no nagging doubts left in the reader.
When constructing a proof, the most essential step is to state clearly the
statement to be proven and any assumptions. Here formal set notation and
logic may be used to good advantage. When the proof is completed, the
reader should be able to identify how the argument supports the truth of
the statement, and ideally, the reader should gain some insight into the
nature of the structures involved. Proofs may be direct or indirect,
constructive or non-constructive, and if induction is use, it may be weak
or strong. Generally the type of statement we wish to prove is of the
form P(x) => Q(y). Direct proofs start by assuming P(x) to be true, then
showing that Q(x) must follow. Constructive proofs actually bind x in P(x)
then show how to find a y that satisfies Q, based on x. Indirect proofs
assume that Q is false, then show that P must also be false (demonstrating
the contrapositive form of the statement: NOT Q(y) => NOT P(x)). Induction
allows proof for all elements of a countably infinite set by starting with
one or more base case(s) and showing that the statement is true for the
base case(s). Once the appropriate base case(s) have been shown to be true,
then the inductive step is taken. Here, the inductive hypothesis is given,
that the statement is assumed to be true for some element(s) and showing that
under that assumption, the statement must also be true for the next element.
If only the truth of the statement for the preceding element is assumed,
then this is said to be weak induction, while if the truth of the statement
for all preceding elements must be assumed, then this is said to be strong
induction. The relationships of the exact elements for which the truth of
the statement must be assumed to be true (that is, for which the inductive
hypothesis holds), will determine what base case(s) are necessary.
This document is
copyright 1995
by Richard E. Newman-Wolfe.
Send comments to nemo@cis.ufl.edu