Introduction

We will study the relationships between languages, machines, and grammars. A language is a set of strings over a finite alphabet, where a string is the concatenation of zero or more symbols from the alphabet. Machines will always include a means of reading input from an input tape, one symbol at a time, and will contain some amount of finite state control. Additional storage and types of storage, additional heads and the ability to move the head(s) in different ways provide for variations in the machine models, and may allow machines to perform harder tasks or perform the same tasks faster. Machines with output capabilities may also be considered as generators of languages (they output exactly the strings of the language delimited in some fashion) or computers (given an input string, the machine may produce an output string and halt, or if the function is not defined for that input string, it may not halt). For the most part, we will consider machines as language recognizers, that is, given an input string, the machine will execute for some number of steps and halt in an accepting state or not (it may not halt, or it may halt in a non-accepting state). Grammars consist of two sets of symbols, terminals (corresponding to the language's alphabet) and non-terminals (which act as variables), along with rules for how to start and how to replace non-terminals with other strings. Exercising a sequence of these production rules from the start symbol may produce a string of terminals, in which case that string is said to be in the language of that grammar and the sequence is said to be a derivation of that string. Determining if a string is in the language generated by a grammar consists of parsing the string, that is, finding a derivation for the string. These three ways of defining languages (directly, or by describing a recognizing machine or a generating grammar) are closely related, and the nature of their relationships will constitute a major portion of our study.

Additional properties are also of interest, and by establishing the relationships between languages, machines and grammars, we obtain tools to explore these other properties. Languages, machines and grammars may be organized into classes depending upon properties that they have, and in many cases, the classes of languages will correspond to classes of machines and classes of grammars. Determining to which class a particular language belongs is one type of question we will often ask. We will also define operations on languages, by which one or more language may be combined or altered in a systematic way to produce another language (perhaps the same one). A particularly interesting set of questions have to do with closure properties of classes of languages. A class of language is closed under some operation if, given a language in the class, application of the operation always produces another language (perhaps the same one) within the same class. There are subtle but important differences between questions about languages, machines, grammars and classes of these. The student should gain an appreciation of these differences in addition to acquiring techniques to answer the questions that arise.

Special questions of great importance include whether a language is computable (that is, is there any machine that can recognize it among the fleet of machine models we know), whether a language is recognizable by a machine that always halts (decidability), and whether a language has an ``efficient'' machine (one that operates in polynomial time: tractability). Once we have gained a solid foundation in the concepts of language theory, we will explore the structure of classes of languages.

Basics

Below are some preliminary definitions provided to remind the student of the basic mathematics we will need in our explorations. This summary will be rather abstract - refer to the book for examples.

Sets

In what follows, I must apologize for overloading some symbols and classes of symbols (a limitation of the fonts available). We will always assume the existence of a universal set, U, relative to which the complement operation will be defined. A set consists of some number (zero, a finite number, a countably infinite number or an uncountably infinite number) of elements. A set may be defined by listing the elements (if it is finite), by describing some means of generating or recognizing the elements, or by a ``set former''. The latter is of the form

S = {x in A | Q(x)},

read as ``S consists of x in A such that Q of x is true (or simply, Q of x).'' Q(x) is a predicate that describes what must be true of x in order for it to be in the set S. Standard sets of numbers we will use include R (the real numbers), Z (the integers), N (the naturals, N={1,2,3,...}), W (the whole numbers, W = {0,1,2,...}) and for each of these, S+ (the positive members of the set indicated). Some set operations are defined below.

Complement: S' = {x in U | x is not in S}
Difference: A \ B = {x in U | x is in A and x is not in B}
Union: A U B = {x in U | x is in A or x is in B}
Intersection: A ^ B = {x in U | x is in A and x is in B}
Disjoint: A is disjoint from B iff A ^ B = {} (the empty set)
Power set: P(S) = { A | A is a subset of S}
Product: A X B = {(x,y) | x is in A and y is in B}

Some properties of these operations are as follow.

Commutative: A U B = B U A, A ^ B = B ^ A.
Associative: (A U B) U C = A U (B U C), (A ^ B) ^ C = A ^ (B ^ C)
Distributive: A U (B ^ C) = (A U B) ^ (A U C), A ^ (B U C) = (A ^ B) U (A ^ C).
De Morgan's Laws: (A U B)' = A' ^ B', (A ^ B)' = A' U B'.

The size of a set S is indicated by |S| or by #S, and is the number of elements in the set. If sets A and B are disjoint, then

|A U B| = |A| + |B|
|A ^ B| = 0
|A X B| = |A| * |B|
|P(A)| = exp(2,|A|)

Logic

Logic deals with statements and their truth sets. A statement may have variables in it that may be bound or free. Binding follows scoping rules much as it does in statically scoped programming languages, with the addition of the two quantifiers, forall (upside-down A) and exists (backwards E). The order of quantifiers matters a great deal: the statements S1 and S2 are not at all the same. Let H = {all humans, living and dead} and let m:H -> H be m(h) = the biological mother of h.

S1: Forall x in H there exists y in H such that y=m(x).
S2: There exists y in H forall x in H such that y=m(x).

S1 states that every human has a (unique) mother, while S2 states that there is some poor human who is the mother of every human that ever lived, herself included.

Statements have associated with them truth sets, for which the statement is true, when an element from the truth set is bound to the statement's variables. For S1, the truth set T(S1) = {(x,m(x)} | x is in H}. For S2, the truth set is empty. A statement is said to be satisfied by an assignment if the assignment is in the truth set, that is, if substitution of the values in the assignment for the variables in the statement produces a true statement.

One statement is said to imply another statement, written S1 => S2, if T(S1) is contained in T(S2). Two statements are equivalent, written S1 <=> S2, iff S1 => S2 and S2 => S1 (their truth sets are equal). Operators on statements include AND (upside down V, for which I will use ^), OR (V), and NOT. Thus P => Q is the same as (NOT P) V Q.

Relations

A binary relation R is just a set of ordered pairs, R contained in A X B. Often A and B are the same set. We write a R b if and only if (iff) (a,b) is in R. Some properties a relation may have are listed below.

Reflexive: for all a, a R a.
Antireflexive: for all a, NOT (a R a).
Symmetric: for all a and b, a R b => b R a.
Antisymmetric: for all a and b, a R b => NOT (b R a).
Transitive: for all a, b, and c, a R b and b R c => a R c.
Equivalence relation: R is an equivalence relation if it is reflexive, symmetric and transitive (RST).

Equivalence relations are especially important, and they partition a set into disjoint parts. If a is an element of the base set A over which the relation is defined, then [a]_R is the equivalence class of a under relation R, [a]_R = {b in A | a R b}. We will drop the R subscript if the relation is understood from context. For any a, b in A, either [a] = [b] or [a] ^ [b] = 0.

Functions

A function is a special type of relation, one for which the second element of each pair is uniquely determined by the first element. If f:A -> B is a function from domain A into codomain B, then for all (a, b) in f and (a', b') in f, if a = a' then b = b' (the second element is unique). If (a,b) is in f, we may write f(a) = b. If for every element of A, there is a b in B such that (a,b) is in f, then f is a total function, otherwise f is a partial function (and there is at least one element of A for which f is not defined). The range R of f, given a domain A, is the subset of the codomain B of the images of the elements of A under f. That is, the range R = {b in B | there is an a in A such that (a,b) is in f}.

Depending on the domain and the codomain, properties a function f:A -> B may have include the following.

Partial: if there is some a in A such that there is no b in B such that (a,b) is in f (i.e., f is not defined at a).
Total: if f is defined on all of A (is not partial).
Injection (one-to-one): f:A >-> B iff for all (a,b) and (a',b') in f, b = b' => a = a', that is, the for every element of B there is at most one pre-image in A.
Many-to-one: f:A -> B is many-to-one if there exist a, a' in A, a <> a', and f(a) = f(a'), that is, there are at least two distinct elements of the domain that have the same image under f.
Surjection (onto): f:A ->> B iff for all b in B there is an a in A such that f(a) = b, that is, the codomain is the range.
Bijection: f:A >->> B iff f is both one-to-one and onto.

Bijections are important because they are invertible (f inverse is a function and f inverse inverse is f), and they compose (the composition of two bijections is also a bijection). It cannot be stressed too strongly that these properties are relative to the domain and codomain defined.

Languages

Languages are sets of strings over a finite alphabet. An alphabet A is just a set of distinct symbols, A = {a_1, a_2, ... a_k}, and a string is the concatenation of zero or more symbols. The length of a string s is the number of symbols in the string, |s| = k if s = a_i1 a_i2 ... a_ik. Formally, a string of length k may be thought of as a function from {1, 2, ..., k} into A. The null string, denoted by e here and by the Greek letter epsilon or lambda elsewhere, is the string of length zero, or the null function (an empty set of ordered pairs).

A* = {s | s is a string over A} is the set of all strings over alphabet A.

A* = {s = a_1 a_2 ... a_k | k >= 0 and for all i, 0
A language L over alphabet A is just a subset of A*, L contained in P(A*). Two important language operations are defined below.
- Concatenation: L_1 L_2 = {s = s_1 s_2 | s_1 is in L_1 and s_2 is in L_2}.
- Kleene closure: L* = {s = s_1 s_2 ... s_k | k >= 0 and for all i, 0
  The Kleene closure may be defined formally by building on sets L^k defined in a manner similar to that for strings, as a function from {1,2,...,k} into L.
  - L^k = {s = s_1 s_2 ... s_k | for all i, 0 L* = L^0 U L^1 U ... (Union over all k in W of L^k).
  Note that the Kleene closure of a language always contains the null string, e, since e as a string corresponds to the null function, which is the only function in L^0, when it is considered as the set of all functions from {} into L. This is similar to the notion of the power set of any set always including the empty set. In particular, note that 0* = { e }.
  Proofs
  A proof is in essence a social concept, it is an argument or explanation that is sufficiently detailed and logical that other smart people are convinced of the truth of the statement that is to be proven. Proofs may be best written in an hierarchical fashion, this providing an outline of the major steps, with substeps within each step justifying that step. A step that is obvious to someone may be scanned, while one that is not obvious may be expanded and investigated in greater detail. Formal proof systems rely on axioms (statements taken to be true without proof) and rules that allow one or more statements to produce new statements. We will pursue proofs here that are informal (and I hope, not tedious) but sufficiently detailed that there are no nagging doubts left in the reader.
  When constructing a proof, the most essential step is to state clearly the statement to be proven and any assumptions. Here formal set notation and logic may be used to good advantage. When the proof is completed, the reader should be able to identify how the argument supports the truth of the statement, and ideally, the reader should gain some insight into the nature of the structures involved. Proofs may be direct or indirect, constructive or non-constructive, and if induction is use, it may be weak or strong. Generally the type of statement we wish to prove is of the form P(x) => Q(y). Direct proofs start by assuming P(x) to be true, then showing that Q(x) must follow. Constructive proofs actually bind x in P(x) then show how to find a y that satisfies Q, based on x. Indirect proofs assume that Q is false, then show that P must also be false (demonstrating the contrapositive form of the statement: NOT Q(y) => NOT P(x)). Induction allows proof for all elements of a countably infinite set by starting with one or more base case(s) and showing that the statement is true for the base case(s). Once the appropriate base case(s) have been shown to be true, then the inductive step is taken. Here, the inductive hypothesis is given, that the statement is assumed to be true for some element(s) and showing that under that assumption, the statement must also be true for the next element. If only the truth of the statement for the preceding element is assumed, then this is said to be weak induction, while if the truth of the statement for all preceding elements must be assumed, then this is said to be strong induction. The relationships of the exact elements for which the truth of the statement must be assumed to be true (that is, for which the inductive hypothesis holds), will determine what base case(s) are necessary.
  This document is copyright 1995 by Richard E. Newman-Wolfe.
  Send comments to nemo@cis.ufl.edu

COT6315 Formal Languages and Theory of Computation