Cryptology - I: Project #2

Instructors: R.E. Newman-Wolfe and M.S. Schmalz

Undergrads:

Implement rotor machine simulator (4 base points + 4 points extra for alphabets B-E), then choose options from List 1 to bring your total up to 12 points. No extra credit will be given for points beyond the maximum total of 12 points.

Grads:

Implement rotor machine simulator (4pts + 2 points extra for alphabets B-E), then choose options from List 1 or List 2 to bring your total score up to 14 points. The total options points will be at least 8 points, where at least 3 points must be from List 2. We strongly recommend that you try the Maximum Likelihood Estimation cryptanalysis. No extra credit will be given for points beyond the maximum total of 12 points.

General:

You shall submit electronically (to Dr. Newman-Wolfe only):

Documentation in form of one man page for each program, with format similar to Unix man pages (see /usr/man/man1/*). Include a section for theory of operation. Be sure to specify input and output.

Makefile (see make(1)) - this should be cumulative for the entire term (i.e., every makefile you submit should include commands to make everything you have submitted).

Complete source code, including header files. Code should have sufficient comments internally to aid understanding, and be free of manifest constants (use the #define preprocessor command as needed). Programs should handle erroneous input and provide help to the user.

Read the Programming Hints at the end of this document to help make your life a little easier.

Rotor Machine Simulator - Undergrads and Grads

Recall our discussion of the various rotor machines in class, and in Section 2.3 of the Web-accessible class notes, then implement the following basic rotor machine:

Encryption/decryption: Specifiable from the command line using a -e or -d switch.

Plaintext: Specifiable from the command line using an optional -p ptxt-file switch, where ptxt-file denotes a file containing plaintext. We say that the switch is optional because, if it is not on the command line and encryption mode is chosen, then plaintext is taken from stdin. Conversely, in decryption mode, plaintext would be written to stdout.

Ciphertext: Specifiable from the command line using an optional -c ctxt-file switch, where ctxt-file denotes a file containing ciphertext. We say that the switch is optional because, if it is not on the command line and decryption mode is chosen, then ciphertext is taken from stdin. Conversely, in encryption mode, ciphertext would be written to stdout.

Number of rotors: Specifiable from the command line using an -n nrotor switch, where nrotor denotes the number of rotors.

Alphabet: Specifiable from the command line using an -a switch, as one of the following:
Choose Alphabet A for all your implementations. Then, if you want extra points, implement Alphabets B-E (1 extra point per alphabet). If your program does not implement a particular plaintext/ciphertext alphabet, then your program should issue an error message to stdout when the -a switch is used to specify that alphabet.
Rotor initial position: Specifiable from the command line using a -x posnvec switch, where posnvec is a list of n integers that correspond to the indices of characters in whatever input alphabet you choose. These indices represent the initial rotor positions. For example, if using alphabet A with a 3-rotor machine, one could specify
-x 12 22 17
as the initial rotor positions, i.e., the rotors would start at letters L, V, and Q of {A,B,...,Z}.
Rotor transforms: Specifiable from the command line using a -t n-rotorfiles switch, where n-rotorfiles denote n files, each of which specify one of the n rotors. Each rotor specification file will have the form (e.g., for alphabet A, above):
```
		#Your name, SSN, Crypto-I Proj-2 Rotorfile F-96
		BQCXQREYTUFT...
		
```
where the preceding example denotes the following map:
```
		 A -> B
		 B -> Q
                 C -> C
                 D -> X
		   :   
                  etc.
		
```

That is, the rotor map's input is indexed implicitly by the output symbol position.

Rotor gearing: Specified with a -g switch on the command line, followed by any one of these values (you must implement all of them):
- f for forward odometer gearing, e.g., Rotor #n moves once for each input character and Rotor #n-1 moves once for each |F| revolutions of Rotor #n, where F denotes one of the alphabets listed above.
- b for backward odometer gearing, e.g., Rotor #1 moves once for each input character, and Rotor #2 moves once for each |F| revolutions of Rotor #1, where F denotes one of the alphabets listed above.
- c gearfile for custom odometer gearing, to be specified in the file gearfile that has the following format:
```
		#Your name, SSN, Crypto-I Proj-2 Gearfile F-96
                Ri: 3@K
		R1: 1@R5
		R2: 4@R3
		    :
		    :
		Rn: 7@R2
			
```
  which means that (a) Rotor #i advances three symbols each time an input symbol is read (or a key is depressed); (b) Rotor #1 advances once for each revolution of Rotor #5; (c) Rotor #2 advances four symbols for each revolution of Rotor #3, etc.
  For example, a standard odometer gearing could be specified as:
```
		#Your name, SSN, Crypto-I Proj-2 Gearfile F-96
                R1: 1@K
		R2: 1@R1
		R3: 1@R2
		    :
		    :
		Rn: 1@R(n-1)
			
```
  where R1 would contain the least significant digit of the odometer.
Note: All rotor gearings (forward, backward, and custom) have their starting position specified by the -x posnvec switch, described above. This starting position can be thought of as the "000000" position on an odometer.
Ensure that your gearing file specifies an acyclic connectivity graph among the rotors, to avoid backlash in the "gearing mechanism" (e.g., you do not have a situation where Rotor #1 drives Rotor #3, which drives Rotor #5, which drives Rotor #1). Common algorithms for cycle detection in directed graphs can be used for this, since the gearing file specifies the arcs of a digraph.

The Rotor Machine must be implemented exactly as described above, so that we can test your encryption/decryption capabilities and use the graduate students' simulators for cryptanalysis.

The symbol space for plaintext and ciphertext should be the same size, and should either be identical or should be [a..z] for the plaintext symbols space (denoted by P) and [A..Z] for the ciphertext symbol space (denoted by C).

Notation. In the following description, let p[i] denote the ith element of the plaintext, c[i] denote the ith element of the ciphertext, starting with i = 0. Also let N denote the size of the symbol set and L denote the length of the vector that contains rotor initial positions x[j]. j = 1..n .

List 1: Undergrads - Making an ENIGMA Machine

Now that you have built the basic rotor machine, you will choose options from the following list to make an ENIGMA machine from your rotor machine simulator.

Option 1a. Reflector:

Implement an additional rotor that is a reflector, as we discussed in class. The reflector transform should be specified on the command line using the -r switch, and adding one more rotor map to the file rotorfile specified for the -t option, above. However, the reflector cannot map any symbol to itself. Why?

Scoring: 2 points.

Option 1b. Steckerboard:

Implement a steckerboard substitution, as follows:

Specify the Steckerboard transposition as a bijection on the alphabet(s) you implement, using the command-line switch -s steckerfile, where the file steckerfile has the same format as a rotor file. You can use one of your rotor files for this. However, the Steckerboard rotor will be a non- rotating element. Question: How can you do this with no additional effort?

Scoring: 1 point.

Option 1c. ASCII-vt100 Display:

Implement a display (keyboard, lamps, Steckerboard input and output, and current rotor settings) that uses the vt100 protocol, and is activated by a command-line switch -v. For Alphabet A, the display should look more or less like this:

	   Keybd:    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

	   Input:    nowisthetimeforeverymantocometotheaidofhisparty

	   Stecker:  wimte...                               ...metdf

	   Rotors:        CFDGEHJLINMKORPTSVQUYXWZBA	
                     R1:  |   |   |   |   |   |   |
	                  BADCFGIEJHMLKONRPTQSVWYUZX
                     R2:  |   |   |   |   |   |   |
	                  DBGAIFCEHJMLOKRPVNTYUSXQWZ
		      :               :
		      :               :
                     Rn:  |   |   |   |   |   |   |
		          RQTUSXZWACVBYGDEFILJKHPNMO

	   Lamps:    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
	
	  Output:    ccdhbuiidfyuwtcgskvcpopnprfieuyashqwkclifubjdkc

If a reflector is present, you will need to show the reflector and duplicate the rotors and Steckerboard, in the following order:

                Keyboard display
                Input        "
                Steckerboard "
                Rotors       "
                Reflector    "
                Rotors       " (reverse order)
                Steckerboard "
                Lamps        "
                Output       " (build characterwise)

The preceding display must have the following animation features (use Alphabets A, B, and C only):

The input, Steckerboard, and output each change with input character. The input and output should scroll horizontally.
The appropriate keyboard, rotor, Steckerboard, reflector, and lamp symbols that correspond to the current input character are displayed in reverse video as that particular character is entered into ENIGMA.
For one extra point, implement a command line option -k that allows manual entry of symbols (via your workstation keyboard), instead of reading the symbols from an input file or stdin.
The rotor symbols "light up" (reverse video) as specified by the various rotor transforms and gearing files.

Scoring: (a) 1 base point if you show the input, output at each rotor, and Steckerboard output in both the forward and reverse directions (i.e., before and after reflector), together with the output, (b) 2 base points if the display is implemented as shown above, and (c) 1 extra point for the keyboard input option (#3 in the preceding list).

Option 1d. Period Prediction:

Analyze and predict the rotor machine's period, with proof of correctness for your method. Implement your prediction strategy in a module or function that writes the predicted period to your program output.

In particular, you must have a -q switch on the command line, with the predicted period written to stdout. If you are planning on doing Option 1c, above, then you should include a slot for the predicted period in your display widget, to appear below the output line.

Scoring: 2 points.

List 2: Graduate Students - Rotor Machine Cryptanalysis Programs

Here follows a list of programs that will help you use the "toolkit" you built in Project #1 for cryptanalysis:

Program 2a. Kasiski Attack:

Expand the Kasiski starter program you did in Project #1 to attack the output of a small rotor machine. For example, you might limit your configuration to two rotors and an eight-character subset of Alphabet A, above. Use a reasonably sized plaintext corpus, and try several different gearing schemes. You might also find it useful to implement the Period Prediction (Option 1d from List 1, above), which will help you test the Kasiski result for validity.

Scoring: 1 point, since the Kasiski routine from Project #1 will be made available. Or, if you want to get started early, you may share code with someone else who has done the Kasiski routine.

Program 2b. Output Statistics Prediction:

Use the matrix-multiplication method described in class to implement automated prediction of your rotor machine's adjacency matrix (which describes the rotor connectivity graph at a given rotor position). Then, use the adjacency matrix and the input histogram to predict the output histogram, mean, standard deviation, mode, and median.

Hint: Once you produce the rotor machine's adjacency matrix, you have a graph that is not bipartite, i.e., a graph that depicts a bijection. This graph depicts a map M. For example, given the rotor machine as an encryption device, you would relabel the histogram h of symbols in your plaintext corpus (including n-grams) by applying M to domain(h).

Scoring: 1 point + 1 extra point if you implement display of the predicted output histogram for each change in the rotor position. Display will be implemented as a histogram that changes with each input symbol, located beneath the output line on Option 1c (List 1, above).

One additional point will be awarded (for a possible maximum of three points on this option) if you accumulate the output histogram for the entire input message and display that as your final histogram (with appropriate legend). Also, for the third point, you must display the mean, standard deviation, median, and mode of the final histogram.

Hint: This feature will be useful for analyzing the success of your attack on rotor machines that is based on maximum likelihood estimation.

Program 2c. Maximum Likelihood Attack:

Implement the maximum likelihood estimation (MLE) method described in class, together with the methods of digram- and trigram-based cryptanalysis of the Caesar and Vigenere ciphers that you developed for Project #1. Your goal is to successfully predict any given rotor machine configuration by performing the following steps:

Step 1.

Step 2. Initialize each rotor adjacency matrix to 1/|F|, which you may wish to perturb with small random values (< 1E-6) in order to "seed" your regression algorithm.

Step 3. Devise an objective function f for comparing the histogram of your trial output (decryption result) with the known plaintext corpus statistics. For example, you might try the norm of the difference between histograms, or the differences between histogram parameters (e.g., mean, std-deviation, etc.). Or, you might prefer another measure (e.g., Mahalanobis distance) that you can discover in the literature of pattern recognition.

Step 4. Apply the MLE technique we discussed in class, seeking at first to effect large perturbations in the merit function output (e.g., candidate plaintext histogram) by making small changes in the input (e.g., rotor adjancency matrix values).

Step 5. Refine your MLE approach to minimize the difference between known and candidate plaintext histograms. The method is as follows:

Step 6. As the algorithm begins to converge, test your candidate plaintext (Step 5c) using histograms of digrams, trigrams, and words separately and together. The adjacency matrices you obtain in Step 5d will be your best guess at the rotor machine's configuration that you produce by automated means. Be aware that the algorithm may not converge to zero difference. So, you will want to visually monitor the output as the algorithm iterates, until you have a candidate plaintext that has enough recognizable words and parts of words that you can guess the rest.

Step 7. Refine your best candidate plaintext by filling in symbols that the algorithm cannot guess. (Such incompleteness is usually due to various sources of noise or quantization error.) Your best approach here would be to complete the plaintext by hand, which is the method used in cryptanalytic practice.

Step 8. Using your ciphertext and candidate plaintext, refine your algorithm's "best guess" adjacency matrices obtained in Step 6 to be Boolean matrices. Then, program your rotor machine accordingly and use this to generate plaintext d from the ciphertext. Your plaintext d obtained from the rotor machine should match the plaintext you obtained in Step 7. If they don't match, check the settings on your rotor machine.

Hint: (1) Start with a simple rotor machine configuration, e.g., one rotor to begin with. (2) Then add rotors using the odometer gearing, so you know exactly the period of the cipher. (3) After that, you can progress to reverse odometer gearing and then to custom rotor configurations. Do not try to cryptanalyze a detailed rotor machine initially, or you will never finish the project.

Hint: To achieve convergence, we suggest you start with a simplex algorithm, which can be found in many textbooks, including Numerical Recipes (in the UF library). A least-squares regression approach is also useful, but could take longer to converge.

Scoring: 3 points.

Program 2d. Genetic Algorithm based Cryptanalysis:

A more difficult approach to MLE-based cryptanalysis, which is somewhat harder to program and parameterize, involves using genetic algorithms instead of the simplex approach. This generally yields faster convergence, as well as guaranteed convergence to a local minimum difference between histograms of corpus and candidate plaintext.

Implement Program 2c in terms of a genetic algorithm (GA), rather than the methods described above. References for GAs will shortly be added to this Web page, and the technique will be discussed briefly in class, as well as on the Web page for rotor machines.

Hint: Note that the peaks of the plaintext histogram usually denote more significant characters. This is true for digrams, which in English often represent phonemes (a significant fact, since English is a phonetic language). Thus, when parameterizing your genetic algorithm, start with a one-rotor machine (monoalphabetic substitution). Partition the parameters of the rotor adjacency matrix according to the frequency-of-occurrence of the associated input characters. For example, you may want to set the partition between the second and third quartiles of the histogram to start. Continue this practice as you increase the number of rotors. Also consider using the output statistics prediction method discussed in Program 2b to predict where your partition should be located in the additional rotors' adjacency matrices.

Scoring: 2 points, if done in addition to Program 2c.

Programming Hints.

Here are some hints to make life a little easier:

Documentation. Try to pattern your documentation exactly like the man pages in UNIX. You can work together to determine the format that you use, but please do your own pages. It would be nice if everyone used the same format. See the consultants if you have questions about the documentation for the nroff -man macro package.

Example Output. It would also be useful to have example input and output with each program you submit. That way, if the instructors can't get your program to work on our input, we can use your input and check it against your output.

References

- Genetic Algorithms:

Mitchell, M. Introduction to Genetic Algorithms, Cambridge, MA: MIT Press (1996).

Bach, T. Evolutionary Algorithms in Theory and Practice, New York: Oxford University Press (1995)

Morder, D.D. "Constrainted Optimization of Smooth Functions Using a Genetic Algorithm", NASA Technical Paper #3329, NASA Langley Research Center (1995). [In UF Library Microforms, call number is NAS 1.60:3329]

Adeli, H. Machine Learning, New York: Wiley (1995).

Kau, C.L. "Genetic Algorithms Applied to Least-Squares Curve Fitting", US Bureau of Mines TR #9339. [in UF Science Library, call number is I 28.23:9339 -- also check microforms]

- Optimization and Simplex Algorithm:

Srina, V. Algorithms for Linear-Quadratic Optimization, New York: Marcel Dekker (1996).

Rolf, H.L. and G. Williams. Finite Mathematics, Dubuque, IA: W.C. Brown (1988).

Bellman, R. Perturbation Techniques in Mathematics, Physics, and Engineering, New York: Holt, Rinehart, and Winston (1966).

Bender, E.A. An Introduction to Mathematical Modelling, New York: John Wiley & Sons (1978).

Kevorkian, J. and J.D. Cole. Perturbation Methods in Applied Mathematics, New York: Springer-Verlag (1981).

Wan, F.Y.M. Mathematical Models and their Analysis, New York: Harper and Row (1989).

This concludes the description for Project #2. References will be included subsequently.