|
| 1 | +# Cracking a Caesar Cipher |
| 2 | + |
| 3 | +You're going to use _frequency analysis_ to crack a Caesar cipher to |
| 4 | +recover the key and the plaintext. |
| 5 | + |
| 6 | +## Caesar Ciphers |
| 7 | + |
| 8 | +These are methods of encryption where you take the _plaintext_ (the |
| 9 | +unencrypted text) and encrypt it by substituting one letter for another |
| 10 | +to produce the _ciphertext_ (the encrypted text). |
| 11 | + |
| 12 | +For example, we might have the following mapping (which is the _key_ for |
| 13 | +unlocking this cipher, not to be confused with a hash table key): |
| 14 | + |
| 15 | +``` |
| 16 | +A -> H B -> Z C -> Y D -> W E -> O |
| 17 | +F -> R G -> J H -> D I -> P J -> T |
| 18 | +K -> I L -> G M -> L N -> C O -> E |
| 19 | +P -> X Q -> K R -> U S -> N T -> F |
| 20 | +U -> A V -> M W -> B X -> Q Y -> V |
| 21 | +Z -> S |
| 22 | +``` |
| 23 | + |
| 24 | +So if you have plaintext like `HELLO, WORLD!`, use the above table and |
| 25 | +`H` becomes `D`, `E` becomes `O`, and so on to produce ciphertext |
| 26 | +`DOGGE, BEUGW!` |
| 27 | + |
| 28 | +To decode, just do the reverse, `D` becomes `H`, etc. |
| 29 | + |
| 30 | +But what if you evesdrop on some ciphertext, but don't know the key (the |
| 31 | +mapping). How can you decode it? |
| 32 | + |
| 33 | +## Frequency Analysis |
| 34 | + |
| 35 | +Turns out, letters occur in the English language with a known frequency. |
| 36 | +The letter `A` is 8.46% of all letters, for example. |
| 37 | + |
| 38 | +(Disclaimer: these are not the actual frequencies in general english |
| 39 | +prose. They're contrived for this specific challenge so that you get a |
| 40 | +decent result, but they're quite close to the real percentages.) |
| 41 | + |
| 42 | +| Letter | Percentage | |
| 43 | +|:------:|-----------:| |
| 44 | +| E | 11.53 | |
| 45 | +| T | 9.75 | |
| 46 | +| A | 8.46 | |
| 47 | +| O | 8.08 | |
| 48 | +| H | 7.71 | |
| 49 | +| N | 6.73 | |
| 50 | +| R | 6.29 | |
| 51 | +| I | 5.84 | |
| 52 | +| S | 5.56 | |
| 53 | +| D | 4.74 | |
| 54 | +| L | 3.92 | |
| 55 | +| W | 3.08 | |
| 56 | +| U | 2.59 | |
| 57 | +| G | 2.48 | |
| 58 | +| F | 2.42 | |
| 59 | +| B | 2.19 | |
| 60 | +| M | 2.18 | |
| 61 | +| Y | 2.02 | |
| 62 | +| C | 1.58 | |
| 63 | +| P | 1.08 | |
| 64 | +| K | 0.84 | |
| 65 | +| V | 0.59 | |
| 66 | +| Q | 0.17 | |
| 67 | +| J | 0.07 | |
| 68 | +| X | 0.07 | |
| 69 | +| Z | 0.03 | |
| 70 | + |
| 71 | +In other words, ordered from most frequently used to least, the letters |
| 72 | +are: |
| 73 | + |
| 74 | +``` |
| 75 | +'E', 'T', 'A', 'O', 'H', 'N', 'R', 'I', 'S', 'D', 'L', 'W', 'U', |
| 76 | +'G', 'F', 'B', 'M', 'Y', 'C', 'P', 'K', 'V', 'Q', 'J', 'X', 'Z' |
| 77 | +``` |
| 78 | + |
| 79 | +`E` is the most frequent letter. `Z` is the least frequent. And `M` is |
| 80 | +somewhere in the middle. |
| 81 | + |
| 82 | +So if you have a large enough block of ciphertext, you can analyze the |
| 83 | +frequency of letters in there. And if `X` is the most frequent, then |
| 84 | +it's a safe bet that the key includes this mapping: |
| 85 | + |
| 86 | +``` |
| 87 | +E -> X |
| 88 | +``` |
| 89 | + |
| 90 | +## Challenge |
| 91 | + |
| 92 | +Write a program that automatically finds the key for the ciphertext in |
| 93 | +the file [`ciphertext.txt`](ciphertext.txt), then decodes it and shows |
| 94 | +the plaintext. |
| 95 | + |
| 96 | +(All non-letters should pass through the decoding as-is, i.e. spaces and |
| 97 | +punctuation should be preserved. The input will not contain any |
| 98 | +lowercase letters.) |
| 99 | + |
| 100 | +No tests are provided for this one, but the result should be readable, |
| 101 | +with at most a handful of incorrect letters. |
0 commit comments