Pick programs worth a total of 10 points from List 1 and implement only one version of each program selected.
Pick programs worth a total of 10 points from List 2 and implement only one version of each program selected.
Read the Programming Hints at the end of this document to help make your life a little easier.
The following programs should read from stdin and write to stdout for the plaintext/ciphertext, reading the key from the command line; or prompt the user for the key and read it from stdin. Alternatively, you can prompt the user for the input and output file names and read them from stdin, reading and writing to files as specified by the user.
The symbol space for plaintext and ciphertext should be the same size, and should either be identical or should be [a..z] for the plaintext symbols space (denoted by P) and [A..Z] for the ciphertext symbol space (denoted by C). In the following options for P and C, the last four have P and C identical. The following are permissible alphabets for this programming assignment:
Be sure to specify which symbol set you are using in your programs. In points calculations, in addition to the points noted below for each cipher, you will get one additional point for implementing symbol set A or C, and two points for implementing symbol set B. This is for the whole assignment, not for each program, except as noted with individual programs.
To obtain credit, you must provide both encryption and decryption programs for each cipher you submit. Specify encryption with -e flag on the command line and decryption with -d flag on the command line. The key provided for decryption should be the same key used for encryption (so the program must compute the inverse key if needed from the encryption key).
Notation. In the following description, let p[i] denote the ith element of the plaintext, c[i] denote the ith element of the ciphertext, and k[i] denote the ith element of the key, starting with i = 0. Also let N be the size of the symbol set and L be the length of the key, with n(s) the numerical position of symbol s in the alphabet (s is in [0..N-1]).
Here follow the undergraduate program choices:
The key is either a character from the symbol space (preferably plaintext) or a number for the shift, such that
c[i] = p[i] + k[0] .
Program 1b. Affine cipher:
The key is a pair of characters from the plaintext symbol space or a pair of numbers for the multiplier (a) and the shift (b), such that
c[i] = a · p[i] + b modulo N .
Scoring: 3 points.
Program 1c. Vigenere cipher or Extended Vigenere cipher:
Vigenere cipher - key is a word from plaintext. If key is of length L, with p[i] (c[i], k[i]) the ith element of the plaintext (ciphertext, key), then
c[i] = p[i] + k[i modulo L] .
Extended Vigenere cipher - key specification is a word w excerpted from the plaintext, but the extended key is w followed by the remainder of the plaintext symbols in the order they appear in the alphabet. It follows that you must omit those symbols already present in the keyword.
DADDY
, the extended key
(for P = Alphabet A, above) is DADDYBCEFGHIJKLMNOPQRSTUVWXZ
.
Note that the Vigenere cipher permits keywords with repeated
letters, so the total extended key length may be greater than 26.
Scoring: 4 points, with 1 extra point for using alphabet B, specified above.
Program 1d. General substitution or Keyword approach substitution:
General substitution - key is a reordered version of entire plaintext alphabet (thus specifying the substitution), such that
c[i] = k[n(p[i])]
Alphabet: abcdefghijklmnopqrstuvwxyz Key: JXKLYENSIACQBFGWHMPROTDUZVthen the plaintext
hello
becomes the ciphertext
SYQQG
.
Scoring: 3 points.
Keyword approach substitution cipher - if a key of length less than the alphabet size is given, then the full substitution key is this word followed by the rest of the plaintext symbols in the order they have in the plaintext alphabet. As in the Extended Vigenere cipher, you must omit those symbols already present in the keyword.
FUBAR
, then the
extended keyword that specifies the substitution is given by
FUBARCDEGHIJKLMNOPQSTVWXYZ
.
In general, the keyword could be 26 (or 64, etc.) characters long, so an arbitrary substitution could be produced. If the keyword has repeated symbols in it, then only the first occurrence of each symbol in the keyword is used.
DADDY
is the keyword, then the full
substitution is the same as it would be for the keyword DAY, i.e.,
DAYBCEFGHIJKLMNOPQRSTUVWXZ
.
Scoring: 4 points.
Program 1e. Rotational stream cipher:
Each plaintext symbol is offset by a multiple of its position in the stream. The multiplier a is the input key, such that
c[i] = p[i] + (a · i) modulo N
Check for multiplier validity (as in affine ciphers).
Scoring: 3 points.
Program 1f. Simple columnar transposition cipher:
Give matrix dimensions a,b as the key. Read in a block of size ab in row major order and output it in column major order.
Note that if the plaintext length is not a multiple of the block length, then you will have to pad the plaintext. There are two ways to do this - you may use a low-freqency character (e.g., Q) for padding (in which case it may be confused with the plaintext). Alternatively, you may encode the length of the padding in the final symbol of the plaintext, and pad between the end of the actual plaintext and the symbol at the end of the last block with random symbols. The latter method provides better transparency, but has the unfortunate property that when the plaintext length is an exact multiple of the block size, an extra block of padding must be sent. It also limits the length of the block to be no more than the size of the input alphabet, but it avoids giving the cryptanalyst clues by the presence of several infrequent characters in the last block.
Scoring: 2 points, plus 1 extra point for using last character to specify length of pad for alphabets D or E. There are 2 extra points if using alphabets A, B, or C.
Program 1g. Transposition cipher:
The key is numeric specification (vector of destination positions) of the transposition.
The numeric specification of a transposition of a block of size m is a sequence of length m that is a permutation of the numbers 0, 1, ..., m-1.
thisistheplaintext
thisis thepla intext
IHSTSI LHPTAE XNEITT
IHSTSILHPTAEXNEITT
.
Scoring: 3 points, plus the same extra points from Program 1f, if you have not already done Program 1f. If you have done Program 1f, then extra points will not be awarded.
Program 1h. Hill Cipher:
Specify an MxM-element matrix in row-major order as the input key (Hint: use hexadecimal input for alphabets D and E). Compute ciphertext as given in book, by multiplying plaintext blocks of size M symbols (which are treated as vectors) by the MxM-element Hill Cipher Matrix to obtain ciphertext blocks of size M.
Scoring: 4 points, plus the same extra points from Program 1f, if not already gained from doing Program 1f.
Program 1i. Squash:
Eliminate input symbols that are not in the valid plaintext/ciphertext alphabet.
Scoring: 1 point (for alphabets A, B, C only).
Compute factors of the ciphertext length, which is given as input (for use with simple transpositions).
Scoring: 1 point.
Program 2b. Frequency analysis:
Given input text, produce (i) listing of all symbols, (ii) the number of times each symbol occurred, (iii) the observed probability of occurence for each symbol, and (iv) the total number of symbols detected. Install an option to output a vector of the probabilities or the frequencies in the order that the symbols occur in a given alphabet.
{A,B,C,D}
and the
plaintext is
ABAABADDDAABAAD
, then install an option to output either
A 8, B 3, D 4
8 3 0 4
.
Scoring: 2 points, plus 1 extra point if you include an option
to sort output by frequency-of-occurrence
(specified as -s
on the command line). Symbols with
equal frequencies must be sorted in lexicographical order (i.e., in
the order that they appear in the alphabet you use).
Program 2c. Digram frequency analysis:
Given input text, produce (i) a listing of all digrams detected, (ii) the number of times each digram occurred, (iii) the observed probability of occurence of each digram, and (iv) the total number of digrams seen. A nice way to display (i) - (iii) is to use two-dimensional tables, one for frequency-of-occurrence, and one for probability.
Scoring: 3 points, plus 1 extra point if you include an
option (specified as -s
on the command line) to sort
output by frequency-of-occurrence. Digrams with
equal frequencies must be sorted in lexicographical order (i.e., in
the order that they appear in the plaintext alphabet). Thus, if
AA
and AC
have equal frequencies, then
AA
would be output before AC
.
Program 2d. Trigram frequency analysis:
Given input text, produce (i) a listing of all trigrams detected, (ii) the number of times each trigram occurred, (iii) the observed probability of occurence of each trigram, and (iv) the total number of trigrams detected. It is better to use a one-dimensional list or a 2-D table of such a list here, rather than trying to enumerate all trigrams in a table (which will be quite sparse).
Scoring: 4 points, plus 1 extra point if you include an
option (specified as -s
on the command line) to sort
output by frequency-of-occurrence. Trigrams with
equal frequencies must be sorted in lexicographical order (i.e., in
the order that they appear in the plaintext alphabet).
Program 2e. Index of coincidence program:
Compute the IC given an input of frequencies of occurrence or observed probabilities for a set of symbols (i.e., the output of Program 2b).
Scoring: 2 points.
Program 2f. Kasiski starter program: Identify any repeated character sequences of three or more characters and their locations in the text (output the repeated text and all the locations it was seen).
Scoring: 3 points, plus 1 extra if you output the relative locations for each repetition as an option, plus one extra point if you output the factors of the relative locations.
Program 2g. Smart de-transposition: Given input text, try locating common trigrams or words like "the" then try to find a block size and transposition pattern in it.
Scoring: 4 points.
Program 2h. Explode/Combine:
Explode: A program to select a subset of the bytes of cyphertext (such as every 3rd letter) and send them to stdout. Command line args should be the modulus and the equivalence class (remainder).
3 1
are the command-line arguments,
then every character in position i = 1 mod 3
would
be sent to the output.
The residue should be in [0..b-1], where b denotes the modulus.
Combine: Recombine the characters of several files by interleaving (take one from file A, then one from B, then one from C, then start over again). Files are specified in order on the command line.
Scoring: 3 points (for both).
Program 2i. Partial Solution Help for Substitution Cipher:
This program is similar to the decryption program for a substitution cipher, except it must allow the dash ("-") as a "don't know" symbol in the key. The program takes ciphertext input and a partial or complete key on command line, and allows you to view partially decoded text as output.
Scoring: 3 points.
Program 2j. Partial Solution Help for Transposition Cipher:
This program is similar to the decryption program for a transpositional cipher, except it must allow the dash ("-") as a "don't know" symbol in the transposition key. The program takes ciphertext input and a partial key on the command line, and allows you to view partially decoded text as output.
Scoring: 3 points.
-e
(-d
) for an encryption (decryption)
program; -a plaintext-alphabet
(optional).-p plaintext
;-c ciphertext
;-k key
;-i input-file
(optional); and-o output-file
(optional). man
pages in UNIX. You can work together
to determine the format that you use, but please do your own
pages. It would be nice if everyone used the same format. This concludes the description for Project #1.