Implement rotor machine simulator (4 base points + 4 points extra for alphabets B-E), then choose options from List 1 to bring your total up to 12 points. No extra credit will be given for points beyond the maximum total of 12 points.
Implement rotor machine simulator (4pts + 2 points extra for alphabets B-E), then choose options from List 1 or List 2 to bring your total score up to 14 points. The total options points will be at least 8 points, where at least 3 points must be from List 2. We strongly recommend that you try the Maximum Likelihood Estimation cryptanalysis. No extra credit will be given for points beyond the maximum total of 12 points.
#define
preprocessor command as needed).
Programs should handle erroneous input and provide help to the
user.Read the Programming Hints at the end of this document to help make your life a little easier.
Recall our discussion of the various rotor machines in class, and in Section 2.3 of the Web-accessible class notes, then implement the following basic rotor machine:
-e
or -d
switch.
-p ptxt-file
switch, where ptxt-file
denotes a
file containing plaintext. We say that the switch is
optional because, if it is not on the command line and
encryption mode is chosen, then plaintext is taken from
stdin
. Conversely, in decryption mode,
plaintext would be written to stdout
.
-c ctxt-file
switch, where ctxt-file
denotes a
file containing ciphertext. We say that the switch is
optional because, if it is not on the command line and
decryption mode is chosen, then ciphertext is taken from
stdin
. Conversely, in encryption mode,
ciphertext would be written to stdout
.
-n nrotor
switch,
where nrotor
denotes the number
of rotors.
Choose Alphabet A for all your implementations. Then,
if you want extra points, implement Alphabets B-E (1
extra point per alphabet). If your program does not
implement a particular plaintext/ciphertext alphabet,
then your program should issue an error message to
stdout
when the -a switch is used to specify
that alphabet.
-x posnvec
switch,
where posnvec
is a list of n integers
that correspond to the indices of characters in whatever
input alphabet you choose. These indices represent the
initial rotor positions. For example, if using alphabet
A with a 3-rotor machine, one could specify
-x 12 22 17
-t n-rotorfiles
switch,
where n-rotorfiles
denote n files,
each of which specify one of the n rotors. Each
rotor specification file will have the form (e.g., for
alphabet A, above):
#Your name, SSN, Crypto-I Proj-2 Rotorfile F-96 BQCXQREYTUFT...
where the preceding example denotes the following map:
A -> B B -> Q C -> C D -> X : etc.
That is, the rotor map's input is indexed implicitly by the output symbol position.
-g
switch on the command line, followed by any one of these
values (you must implement all of them):
f
for forward odometer gearing,
e.g., Rotor #n moves once for each input character
and Rotor #n-1 moves once for each |F|
revolutions of Rotor #n, where F denotes
one of the alphabets listed above.
b
for backward odometer gearing,
e.g., Rotor #1 moves once for each input character,
and Rotor #2 moves once for each |F|
revolutions of Rotor #1, where F denotes
one of the alphabets listed above.
c gearfile
for custom odometer
gearing, to be specified in the file
gearfile
that has the following
format:
#Your name, SSN, Crypto-I Proj-2 Gearfile F-96 Ri: 3@K R1: 1@R5 R2: 4@R3 : : Rn: 7@R2which means that (a) Rotor #i advances three symbols each time an input symbol is read (or a key is depressed); (b) Rotor #1 advances once for each revolution of Rotor #5; (c) Rotor #2 advances four symbols for each revolution of Rotor #3, etc.
For example, a standard odometer gearing could be specified as:
#Your name, SSN, Crypto-I Proj-2 Gearfile F-96 R1: 1@K R2: 1@R1 R3: 1@R2 : : Rn: 1@R(n-1)where R1 would contain the least significant digit of the odometer.
Note: All rotor gearings (forward, backward, and
custom) have their starting position specified by
the -x posnvec
switch, described
above. This starting position can be thought of as the
"000000" position on an odometer.
Ensure that your gearing file specifies an acyclic connectivity graph among the rotors, to avoid backlash in the "gearing mechanism" (e.g., you do not have a situation where Rotor #1 drives Rotor #3, which drives Rotor #5, which drives Rotor #1). Common algorithms for cycle detection in directed graphs can be used for this, since the gearing file specifies the arcs of a digraph.
The symbol space for plaintext and ciphertext should be the same size, and should either be identical or should be [a..z] for the plaintext symbols space (denoted by P) and [A..Z] for the ciphertext symbol space (denoted by C).
Notation. In the following description, let p[i] denote the ith element of the plaintext, c[i] denote the ith element of the ciphertext, starting with i = 0. Also let N denote the size of the symbol set and L denote the length of the vector that contains rotor initial positions x[j]. j = 1..n .
Implement an additional rotor that is a reflector, as we discussed
in class. The reflector transform should be specified on the
command line using the -r
switch, and adding one more
rotor map to the file rotorfile
specified for the
-t
option, above. However, the reflector cannot
map any symbol to itself. Why?
Scoring: 2 points.
Option 1b. Steckerboard:
Implement a steckerboard substitution, as follows:
Specify the Steckerboard transposition as
a bijection on the alphabet(s) you implement, using the
command-line switch -s steckerfile
,
where the file steckerfile
has the same
format as a rotor file. You can use one of your rotor files
for this. However, the Steckerboard rotor will be a non-
rotating element. Question: How can you do this with
no additional effort?
Scoring: 1 point.
Option 1c. ASCII-vt100 Display:
Implement a display (keyboard, lamps, Steckerboard input and
output, and current rotor settings) that uses the vt100 protocol,
and is activated by a command-line switch -v
.
For Alphabet A, the display should look more or less like this:
Keybd: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Input: nowisthetimeforeverymantocometotheaidofhisparty Stecker: wimte... ...metdf Rotors: CFDGEHJLINMKORPTSVQUYXWZBA R1: | | | | | | | BADCFGIEJHMLKONRPTQSVWYUZX R2: | | | | | | | DBGAIFCEHJMLOKRPVNTYUSXQWZ : : : : Rn: | | | | | | | RQTUSXZWACVBYGDEFILJKHPNMO Lamps: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Output: ccdhbuiidfyuwtcgskvcpopnprfieuyashqwkclifubjdkc
If a reflector is present, you will need to show the reflector and duplicate the rotors and Steckerboard, in the following order:
Keyboard display Input " Steckerboard " Rotors " Reflector " Rotors " (reverse order) Steckerboard " Lamps " Output " (build characterwise)Put this (or reasonable equivalent thereof) in a nice little ASCII display box with your name, SSN, "CRYPTO-I F-96", and "Project #1".
The preceding display must have the following animation features (use Alphabets A, B, and C only):
-k
that allows manual entry of symbols
(via your workstation keyboard), instead of reading
the symbols from an input file or stdin
.
Scoring: (a) 1 base point if you show the input, output at each rotor, and Steckerboard output in both the forward and reverse directions (i.e., before and after reflector), together with the output, (b) 2 base points if the display is implemented as shown above, and (c) 1 extra point for the keyboard input option (#3 in the preceding list).
Option 1d. Period Prediction:
Analyze and predict the rotor machine's period, with proof of correctness for your method. Implement your prediction strategy in a module or function that writes the predicted period to your program output.
In particular, you must have a -q
switch on the command line, with the predicted period written to
stdout
. If you are planning on doing Option
1c, above, then you should include a slot for the predicted
period in your display widget, to appear below the output line.
Scoring: 2 points.
Expand the Kasiski starter program you did in Project #1 to attack the output of a small rotor machine. For example, you might limit your configuration to two rotors and an eight-character subset of Alphabet A, above. Use a reasonably sized plaintext corpus, and try several different gearing schemes. You might also find it useful to implement the Period Prediction (Option 1d from List 1, above), which will help you test the Kasiski result for validity.
Scoring: 1 point, since the Kasiski routine from Project #1 will be made available. Or, if you want to get started early, you may share code with someone else who has done the Kasiski routine.
Program 2b. Output Statistics Prediction:
Use the matrix-multiplication method described in class to implement automated prediction of your rotor machine's adjacency matrix (which describes the rotor connectivity graph at a given rotor position). Then, use the adjacency matrix and the input histogram to predict the output histogram, mean, standard deviation, mode, and median.
Hint: Once you produce the rotor machine's adjacency matrix, you have a graph that is not bipartite, i.e., a graph that depicts a bijection. This graph depicts a map M. For example, given the rotor machine as an encryption device, you would relabel the histogram h of symbols in your plaintext corpus (including n-grams) by applying M to domain(h).
Scoring: 1 point + 1 extra point if you implement display of the predicted output histogram for each change in the rotor position. Display will be implemented as a histogram that changes with each input symbol, located beneath the output line on Option 1c (List 1, above).
Hint: This feature will be useful for analyzing the success of your attack on rotor machines that is based on maximum likelihood estimation.
Program 2c. Maximum Likelihood Attack:
Implement the maximum likelihood estimation (MLE) method described in class, together with the methods of digram- and trigram-based cryptanalysis of the Caesar and Vigenere ciphers that you developed for Project #1. Your goal is to successfully predict any given rotor machine configuration by performing the following steps:
Step 2. Initialize each rotor adjacency matrix to 1/|F|, which you may wish to perturb with small random values (< 1E-6) in order to "seed" your regression algorithm.
Step 3. Devise an objective function f for comparing the histogram of your trial output (decryption result) with the known plaintext corpus statistics. For example, you might try the norm of the difference between histograms, or the differences between histogram parameters (e.g., mean, std-deviation, etc.). Or, you might prefer another measure (e.g., Mahalanobis distance) that you can discover in the literature of pattern recognition.
Step 4. Apply the MLE technique we discussed in class, seeking at first to effect large perturbations in the merit function output (e.g., candidate plaintext histogram) by making small changes in the input (e.g., rotor adjancency matrix values).
Step 5. Refine your MLE approach to minimize the difference between known and candidate plaintext histograms. The method is as follows:
Step 6. As the algorithm begins to converge, test your candidate plaintext (Step 5c) using histograms of digrams, trigrams, and words separately and together. The adjacency matrices you obtain in Step 5d will be your best guess at the rotor machine's configuration that you produce by automated means. Be aware that the algorithm may not converge to zero difference. So, you will want to visually monitor the output as the algorithm iterates, until you have a candidate plaintext that has enough recognizable words and parts of words that you can guess the rest.
Step 7. Refine your best candidate plaintext by filling in symbols that the algorithm cannot guess. (Such incompleteness is usually due to various sources of noise or quantization error.) Your best approach here would be to complete the plaintext by hand, which is the method used in cryptanalytic practice.
Step 8. Using your ciphertext and candidate plaintext, refine your algorithm's "best guess" adjacency matrices obtained in Step 6 to be Boolean matrices. Then, program your rotor machine accordingly and use this to generate plaintext d from the ciphertext. Your plaintext d obtained from the rotor machine should match the plaintext you obtained in Step 7. If they don't match, check the settings on your rotor machine.
Hint: (1) Start with a simple rotor machine configuration, e.g., one rotor to begin with. (2) Then add rotors using the odometer gearing, so you know exactly the period of the cipher. (3) After that, you can progress to reverse odometer gearing and then to custom rotor configurations. Do not try to cryptanalyze a detailed rotor machine initially, or you will never finish the project.
Hint: To achieve convergence, we suggest you start with a simplex algorithm, which can be found in many textbooks, including Numerical Recipes (in the UF library). A least-squares regression approach is also useful, but could take longer to converge.
Scoring: 3 points.
Program 2d. Genetic Algorithm based Cryptanalysis:
A more difficult approach to MLE-based cryptanalysis, which is somewhat harder to program and parameterize, involves using genetic algorithms instead of the simplex approach. This generally yields faster convergence, as well as guaranteed convergence to a local minimum difference between histograms of corpus and candidate plaintext.
Implement Program 2c in terms of a genetic algorithm (GA), rather than the methods described above. References for GAs will shortly be added to this Web page, and the technique will be discussed briefly in class, as well as on the Web page for rotor machines.
Hint: Note that the peaks of the plaintext histogram usually denote more significant characters. This is true for digrams, which in English often represent phonemes (a significant fact, since English is a phonetic language). Thus, when parameterizing your genetic algorithm, start with a one-rotor machine (monoalphabetic substitution). Partition the parameters of the rotor adjacency matrix according to the frequency-of-occurrence of the associated input characters. For example, you may want to set the partition between the second and third quartiles of the histogram to start. Continue this practice as you increase the number of rotors. Also consider using the output statistics prediction method discussed in Program 2b to predict where your partition should be located in the additional rotors' adjacency matrices.
Scoring: 2 points, if done in addition to Program 2c.
man
pages in UNIX. You can work together
to determine the format that you use, but please do your own
pages. It would be nice if everyone used the same format. See
the consultants if you have questions about the documentation
for the nroff -man macro package.
Bach, T. Evolutionary Algorithms in Theory and Practice, New York: Oxford University Press (1995)
Morder, D.D. "Constrainted Optimization of Smooth Functions Using a Genetic Algorithm", NASA Technical Paper #3329, NASA Langley Research Center (1995). [In UF Library Microforms, call number is NAS 1.60:3329]
Adeli, H. Machine Learning, New York: Wiley (1995).
Kau, C.L. "Genetic Algorithms Applied to Least-Squares Curve Fitting", US Bureau of Mines TR #9339. [in UF Science Library, call number is I 28.23:9339 -- also check microforms]
Rolf, H.L. and G. Williams. Finite Mathematics, Dubuque, IA: W.C. Brown (1988).
Bellman, R. Perturbation Techniques in Mathematics, Physics, and Engineering, New York: Holt, Rinehart, and Winston (1966).
Bender, E.A. An Introduction to Mathematical Modelling, New York: John Wiley & Sons (1978).
Kevorkian, J. and J.D. Cole. Perturbation Methods in Applied Mathematics, New York: Springer-Verlag (1981).
Wan, F.Y.M. Mathematical Models and their Analysis, New York: Harper and Row (1989).
This concludes the description for Project #2. References will be included subsequently.