The cracking of the Enigma code is has been said to be the most important historical contribution of cryptanalysis [-]. It is well known that the efforts of the Bletchley Park cryptanalysis team (also called "crippies", who were led in part by Alan Turing), directly resulted in the saving of at least tens of thousands of lives and the shortening of World War II by perhaps several years. An excellent review of this period in cryptology is given in Reference [-], with supplemental material in References [-], [-], and [-].
TR(a(i)) = Tn(pn-1,Tn-1(pn-2,Tn-2(... T2(p1,T1(p0,a(i))) ... ))) , idomain(a) , (I)
where a(i) denotes a symbol input on the keyboard, p0 denotes the initial position of one or more rotors, and TR(a(i)) is displayed by the lamp that illuminates a given character on the displayed alphabet F, as shown in Figure 1.
Figure 1. Schematic rotor machine with n = 3 rotors and
alphabet F = {A,B,C,D}.
Remark. In the WWII era, rotors were typically constructed of Bakelite or similar dielectric material, with wires embedded inside the dielectric. A wire implemented a given map between two symbols, where the rotor map was a bijection.
Observation. If the rotors of a GRM rotate at the same speed and maintain constant angular offset between adjacent rotors, then the GRM implements a Caesar cipher. This can be proven by noting that
Remark. In order for the rotor machine to implement a long-period Vigenere cipher, the rotors must have different rotational speeds, such that an n-rotor machine has a maximal period |F|n. That is, the position of the i-th rotor depends on the (i+1)-th rotor's position. This is the customary serial (odometer-like) gearing of most Enigma machines. Since there are no fixed rules for the way a rotor machine must be geared, many variations are possible, as discussed in the following section.
A further addition was the Steckerboard, a manual plugboard not unlike a small telephone switchboard of the time. The Steckerboard first implemented a substitution, which Enigma's developers thought would render Enigma secure. Near the end of the war, there was an attempt to implement a transpostion using the Steckerboard, which was a difficult goal due to the requirement of buffer memory (then available using only relays or mercury delay lines). The Enigma machine developers thought this would render the machine resistant to all cryptanalytic attacks. In the more usual Enigma machine configuration, with the reflector in place, not only were the number of rotors effectively doubled, but the Steckerboard transposition was inverted at the end of the encryption sequence. An Enigma-like rotor machine is shown in Figure 2.
eE(a(i)) = V1(p2,V2(p3,V3(... Vn(px,R(pn,Tn(pn-1,Tn-1(pn-2,Tn-2(... T2(p1,T1(p0,a(i)) ... ) ,
where idomain(a) and Vj inverts Tj for a given rotor position, with j = 1..m. Note that the reflector must not map any symbol to itself (e.g., "A" |-> "A"), since that would cause retracing of the encryption path, which would result in no change to the symbol that was encrypted along the forward path.Remark. Adding the Steckerboard substitution causes eE to be perturbed as
TE(a(i)) = S-1(eE(S(a))) ,
where X = domain(a) and S : F -> F denotes the Steckerboard substitution. If the Steckerboard was to implement a transposition of form S : X -> X, then the preceding equation would becomeTE(a(i)) = eE(a(S(i)))[S-1(i)] ,
and encryption/decryption would be applied to blocks of |X| or fewer symbols.
Figure 2. Schematic Enigma machine with n = 3 rotors and
alphabet F = {A,B,C,D},
where dotted lines denote the
return path following reflection, and the Steckerboard implements
a substitution that includes "B" |-> "A" and "C" |-> "D".
However, the Steckerboard that was implemented as a substitution had minimal effect, since the inverse Steckerboard was applied to the rotors' output. Thus, it was the period of the cipher as generated by the rotors that caused the cryptanalytic search space complexity to be high.
Remark. If the rotors do not move in relation to each other as the wheels of an odometer (e.g., rotor n moves once per input character, rotor n-1 moves once per rotation of rotor n-1, etc.), then the effective period of the Vigenere cipher implemented in the rotor machine can be less than the theoretical maximum. In such cases, the pattern of application of the Caesar cipher implemented in a given rotor may be less regular. This is particularly true when the gear ratios between adjacent rotors are comprised of prime numbers. Such facts are important in cryptanalysis, as follows.
Lemma. G = (S,o) is a group, where o denotes functional composition.
Question. Is G an Abelian group? Answer: No, because composition is not commutative.
Remark. The fact that G is not an Abelian group is important in practice, since this means that different rotors cannot be interchanged while preserving a given encryption transform. Additionally, the rotor initial position and current position become nontrivial considerations.
2) Assume that the plaintext to be determined from given ciphertext c has statistics Pr(b) similar to statistics Pr(a) of a given plaintext corpus. One can then perturb the adjacency matrix representations of the rotors such that Pr(b), where b is the rotor machine output (in decryption mode), approximates Pr(a), and b contains recognizable words or phrases.
3) The output of the preceding step is augmented manually to fill in missing letters in recognizable words, until the complete message emerges. In practical applications, one may only need to complete a portion of the message to obtain the required information.
The goal of this process is to produce rotor machine adjacency matrices that describe the transform which the rotor machine implements. The following theory is illustrative.
Assumption. Let the structure of a rotor over an alphabet F = {A,B,C,D,E} be as shown in Figure 3, below. The rotor transform Tr: F -> F can be expressed in terms of the graph
G(Tr) = {(A,C),(B,A),(C,E),(D,D),(E,B)}.
Figure 3. Schematic diagram of a rotor.
MG(Tr) =
{((i,j),M(i,j)) : M(i,j) = 1 if
(h-1(i),h-1(j))
p2(G)
and zero otherwise, where i,j
}.
Figure 4. Adjacency matrix of the rotor in Figure 3.
Algorithm. Given the preceding theory and observation, we are now able to address the problem of semi-automatically determining the rotor machine's adjacency matrices, and thereby guessing the rotor configuration. The following steps pertain.
Step 1. Construct a plaintext corpus (50k to 100k characters of text). For example, you could save this document to an ASCII file, then read in the file, convert it to uppercase, and discard all characters not in the alphabet {A-Z}. This would be a useful technique for alphabet A only. Other methods would be required to filter the characters in alphabets B-E. Choose a subset of the plaintext corpus and encrypt it to form the "unknown" ciphertext.
Step 2. Compute the n-gram (symbol, digram, trigram, etc.) probability distributions from histograms that you compute given the plaintext corpus you constructed in Step 1. From the probability distributions, you can compute statistical measures associated with each histogram (e.g., mean, mode, median, and standard deviation).
Step 3. Construct each rotor's transform in order to specify the rotor machine, then formulate the adjacency matrix for each rotor transform as shown above. Thus, if you have n=3 rotors, there will be three adjacency matrices. Initialize the matrices to the values discussed in the preceding observation. Additionally, you will want to assume odometer gearing only, i.e., each rotor advances one symbol or position for each complete revolution of the next-less-significant rotor. And, you will need to specify the rotor initial position as a known quantity. Otherwise, you will have to guess the correct position given |F|n possible alternatives. For purposes of efficiency, start with a simple rotor machine (i.e., |F| < 5 and n < 3), with no Steckerboard or reflector.
Step 4. In order to constrain the optimization process, you need a merit function, also called an objective function in optimization theory. This function (which we denote as f) tells you how close you are to satisfying the constraint that directs your MLE-based optimization. For example, in this case, the objective function could compute the norm of the difference between the probability distributions of the candidate (decrypted) plaintext Pr(b) and the known (corpus) plaintext Pr(a) as
f(a,b) = || Pr(a) - Pr(b) || = [(Pr(a) - Pr(b))2]1/2 .
Clearly, one would want to minimize this difference. Note that the objective function can also be formulated from statistical parameters such as the mean, mode, standard deviation, etc., as well as from the histograms (or probability distributions) of various n-grams (e.g., digrams, trigrams, etc.)Step 5. Apply the MLE approach discussed in class as follows:
b) Configure your rotor machine according to the transforms described by the adjacency matrices you computed in Step 5a), and apply the rotor machine decryption to the ciphertext obtained in Step 1. This yields a trial decryption b.
c) Apply your objective function to b to obtain a difference score between statistical measures derived from b and the plaintext corpus.
The goal of the first few iterations of steps 5a) through 5c) is to obtain a large decrease in the output of your objective function. Since you want to approach a near-zero difference in the objective function output as quickly as possible, this implies fast convergence in the initial iterations of the MLE optimization process, as shown in Region A of Figure 5. In subsequent iterations, you will need to time average the output of the objective function f by averaging f's output over the last K iterations. Averaging is essential to remove the oscillations shown in Region B of Figure 5. Without averaging, you will not be able to reach a minimum in f's output.
Figure 5. Hypothetical objective function output that
schematically illustrates
zones of convergence in a constraint-based
optimization problem.
Step 6. After you pass through Region B (slow, oscillatory decrease in f's output), one usually encounters smaller oscillations in Region C, where the average output of f decreases very slowly and the rate of convergence approaches zero. (In order to compute the rate of convergence, take the first derivative with respect to time of the time average of f's output. You may even want to average these derivatives over a few samples, to remove unwanted noise.)
When the rate of convergence brings the average of f's output to within some limit of zero, then it is time to stop the MLE process. In practice, the choice of depends largely upon the quantization error inherent in the n-gram histograms that are employed in computing f. Although this is not usually a problem for large plaintext corpi and ciphertext samples, you may have to choose your ciphertext to be several thousand characters, in order to obtain an average quantization error that is within, say, two percent of full scale (which equals an error of 0.02 in a probability distribution). It would be helpful for you to recall the discussions we had in class about quantization error and analysis of error. Then, use that theory to predict the limiting error with which various symbols can be determined from histogram data.
Step 7. When you have recognizable words or phrases in b, this can also be a sign that the MLE process is coming to a convergence point. At this point, you may want to stop the MLE algorithm and guess the remainder of the text. From the guessed text and the known ciphertext, you can confirm the rotor machine's configuration. A glance at your solution to Homework Problem 1.1 (Vigenere cipher) may help you here.
Be aware that the MLE process usually does not produce a perfect decryption, due to quantization error, computational errors, and erroneous initial assumptions. However, with practice (starting with a one-rotor machine over a very small alphabet), you will be able to obtain reasonably efficient guesses at rotor machine configurations.