Cryptography >  Monoalphabetic Ciphers (1/1) - How to Break them (45+ min.)  
Objectives:
1) Learn how to break any Monoalphabetic  Cipher. 

We are ending our study of monoalphabetic ciphers with the disillusioning fact that they can all be broken except for the perfect One Time Pad which is rarely used because of its unpractical usage. Before we continue studying more secure ciphers it is my pleasure to explain the wonderful art of breaking monoalphabetic ciphers. You will learn how to use cryptoanalytical tools such as letter frequencies and enable you to quickly solve cryptograms that you may find in newspapers. 

 

Break any MONOALPHABETIC CIPHER with the Aid of Letter Frequencies.

The Caesar Cipher, the Multiplication Cipher and the Linear Cipher have one property in common. They all fall in the category of Monoalphabetic Ciphers: "Same plain letters are encoded to the same cipher letter." I.e. in the Caesar Cipher each "a" turned into "d", each "b" turned into "e", etc.

The reason why such Ciphers can be broken is the following:
Although letters are changed the underlying letter frequencies are not! If the plain letter "a" occurs 10 times its cipher letter will do so 10 times. Therefore, ANY Monoalphabetic Cipher can be broken with the aid of letter frequency analysis. Let's analyze how the above Ciphers can be broken. 

1) The keys of the Caesar and the Multiplication Cipher consist of one number. Thus, finding the cipher "e" is sufficient to break each Cipher. 

2) A key of the Linear Cipher, however, consists of the two numbers (a,b). Thus, finding two letter correspondences is sufficient to break it.

3) The most difficult mono-alphabetic substitution cipher to break is the one where each plain letter is randomly assigned to its cipher letter. Instead of using encryption functions we use tables to describe plain-cipher letter correspondences. I.e. each (capital) plain letter can be assigned to a cipher letter as follows:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
q a z w s x e d c r f v t g b y h n u j m i k o l p


"Brute force" - attacks to break the Cipher are hopeless since there are 26! = 403291461126605635584000000 or about 4 * 1026 many possible ways to encode the 26 letters of the English alphabet. In order to crack the random substitution cipher, however, we take advantage of the fact that the underlying letter frequencies of the original plain text don�t get lost. This fact makes the Random Substitution Cipher very susceptible to Cipher attacks. An eavesdropper literally just needs to count the letter frequencies of the Cipher letters. Recall that the most frequent letters in the English language are ETNORIA which � except for the O - occur even as the most frequent letters in the brief virus carrier message. And the longer the messages are the more do the relative frequencies of the cipher letters approach the expected frequencies.

Let�s take a look at another random-substituted encrypted message:

q v v j d s j q o s u   y q c w   a l   j d s
q i s n q e s   q t s n c z q g   q n s   u y s g j
a l   j d s   e b i s n g t s g j   c g   v s u u
j d q g   q   u s z b g w                          

How could we go about breaking this message? Certainly, we shall take advantage of the known letter frequencies. 

Step 1: 
Compute the letter frequencies here: We first find the Cipher E. The letter frequencies show that "s" corresponds to E.

Step 2: 
Secondly, we try to detect the most common English 3-letter word �THE�. We, therefore, have to look for repetitive 3-letter-sequences ending in "s". We observe even without the help of the blank spaces that jds occurs three times, more than any other tri-gram. It is very likely, that we revealed the two correspondences j=T and d=H yielding 

T H E T E T H E
q v v j d s j q o s u   y q c w   a l   j d s
E E E E E T
q i s n q e s   q t s n c z q g   q n s   u y s g j
T H E E E T E
a l   j d s   e b i s n g t s g j   c g   v s u u
T H E
j d q g   q   u s z b g w                          

The knowledge of the letters T, H and E reveals words like there, this, than, thus, that,  etc. So, let�s look for them. We find jdqg which could be this or thus, however, it could not be that. Why not? Checking the frequency of q shows that it is likely to be one of the most common letters ETNORIA, and since it is a vowel (why?) we may reduce the possible choices for q to O, I or A. I or A are more likely to follow TH and may form the second to last 1-letter-words q.

Step 3: 
We now form possible words of the given letters and test if the found letters make sense in other words.  Say, we choose q to be A. What word could the first word qvv be? ALL and ADD are possible, ARE is not. If q would be I then ILL is possible. A seems to be the more reasonable choice. Substituting A for q yields THA_ for jdqg. We know it can not be THAT, therefore, THAN makes more sense. It makes sense that g is N since the g is one of the most frequent letters. Thus, using altogether the correspondences A=q, N=g L=v yields      

A L L T H E T A E A T H E
q v v j d s j q o s u   y q c w   a l   j d s
A E A E A E A N A E E N T
q i s n q e s   q t s n c z q g   q n s   u y s g j
T H E E N E N T N L E
a l   j d s   e b i s n g t s g j   c g   v s u u
T H A N A E N
j d q g   q   u s z b g w                          

Step 4:  
Now, words will appear. vsuu looks very much like LESS, then S_ENT in uysgj should be SPENT. A_E in qns seems to be ARE. This is very likely since the encrypted R, the n, appears frequently. We now have:

A L L T H E T A E S P A T H E
q v v j d s j q o s u   y q c w   a l   j d s
A V E A E A E R A N A R E S P E N T
q i s n q e s   q t s n c z q g   q n s   u y s g j
T H E E R N E N T N L E S S
a l   j d s   e b i s n g t s g j   c g   v s u u
T H A N A S E N
j d q g   q   u s z b g w                          
 

Continuing to detect words, we see that A_ERA_E in qlsnqes looks like AVERAGE, and A_ER__AN in qtsnczqg looks very much like AMERICAN. What other words can you find? Try to finish it by yourself. Replacing these letters yields  

A L L T H E T A E S P A I T H E
q v v j d s j q o s u   y q c w   a l   j d s
A V E R A G E A M E R I C A N A R E S P E N T
q i s n q e s   q t s n c z q g   q n s   u y s g j
T H E G V E R N M E N T I N L E S S
a l   j d s   e b i s n g t s g j   c g   v s u u
T H A N A S E C N
j d q g   q   u s z b g w                          
 

And we finally have:

All the taxes paid by the average American are spent by the government in less than a second 

 


Resumee
Cracking Random Substitution Ciphers can be accomplished by a combination of finding most frequent letters and tri-grams as well as clever guessing and testing missing letters. The more Random Substitution Ciphers you will crack the more experienced you will become. As a side effect: so called �cryptograms� that you find in the newspapers are Random Substitution Ciphers that you will solve with ease.

 


 

 

Exercise 1:  
You now have the opportunity to practice the art of cracking such random substitution ciphers. Here is your cipher to crack:

yxdy pq yjc xzpvpyw ya icqdepzc ayjceq xq yjcw qcc yjcuqcvrcq. xzexjxu vpsdavs
tact is the ability to describe others as they see themselves. abraham lincoln

Exercise 2:
Cra
ck the following �Cryptoclip� found in the �Daily News of the US Virgin Islands� on March 21, 2001. 

vkbjp v avpq bphvu baj hfj bahjk hy yjxbjxfjq bc bdjxbr rjvpy hx baj fccujp

(Hint: the cipher "e" is j. To find more letter frequencies click here.)

after a hard trial the ice thief is sentenced to twenty years in the cooler

top

back  next

Read Textbook on Linear Ciphers incl. C++ programs, Euclidean Algorithm and Letter Frequency Analysis

Related web sources:

Yahoo's Encryption & Security

Encarta.com

Glossary

PBS Online

Introduction to Cryptography

Enigma and the Codebreakers

Enigma History

Enigma Emulator