The Cracking Code Book. Simon Singh
Чтение книги онлайн.

Читать онлайн книгу The Cracking Code Book - Simon Singh страница 7

Название: The Cracking Code Book

Автор: Simon Singh

Издательство: HarperCollins

Жанр: Книги для детей: прочее

Серия:

isbn: 9780007484997

isbn:

СКАЧАТЬ of whether O represents a vowel or a consonant. If O represents a vowel, it should appear before and after most of the other letters, whereas if it represents a consonant, it will tend to avoid many of the other letters. For example, the vowel e can appear before and after virtually every other letter, but the consonant t is rarely seen before or after b, d, g, j, k, m, q or v.

      The table below takes the three most common letters in the ciphertext, O, X and P, and lists how frequently each appears before or after every letter. For example, O appears before A on one occasion but never appears immediately after it, giving a total of one in the first box. The letter O neighbours the majority of letters, and there are only seven that it avoids completely, represented by the seven zeroes in the O row. The letter X is equally sociable, because it too neighbours most of the letters and avoids only eight of them. However, the letter P is much less friendly. It tends to lurk around just a few letters and avoids fifteen of them. This evidence suggests that O and X represent vowels, while P represents a consonant.

      Now we must ask ourselves which vowels are represented by O and X. They are probably e and a, the two most popular vowels in the English language, but does O = e and X = a, or does O = a and X = e? An interesting feature in the ciphertext is that the combination OO appears twice, whereas XX does not appear at all. Since the letters ee appear far more often than aa in plaintext English, it is likely that O = e and X = a.

      At this point, we have confidently identified two of the letters in the ciphertext. Our conclusion that X = a is supported by the fact that X appears on its own in the ciphertext, and a is one of only two English words that consist of a single letter. The only other letter that appears on its own in the ciphertext is Y, and it seems highly likely that this represents the only other one-letter English word, which is i. Focusing on words with only one letter is a standard cryptanalytic trick, and I have included it among a list of cryptanalytic tips in Appendix B. This particular trick works only because this ciphertext still has spaces between the words. Often, a cryptographer will remove all the spaces to make it harder for an enemy interceptor to unscramble the message.

      Although we have spaces between words, the following trick would also work where the ciphertext has been merged into a single string of characters. The trick allows us to spot the letter h once we have already identified the letter e. In the English language, the letter h frequently goes before the letter e (as in the, then, they, etc.), but rarely after e. The table below shows how frequently the O, which we think represents e, goes before and after all the other letters in the ciphertext. The table suggests that B represents h, because it appears before O on nine occasions but never goes after it. No other letter in the table has such an asymmetric relationship with O.

      Each letter in the English language has its own unique personality, which includes its frequency and its relation to other letters. It is this personality that allows us to establish the true identity of a letter, even when it has been disguised by monoalphabetic substitution.

      We have now confidently established four letters, O = e, X = a, Y = i and B = h, and we can begin to replace some of the letters in the ciphertext with their plaintext equivalents. I shall stick to the convention of keeping ciphertext letters in uppercase, while putting plaintext letters in lowercase. This will help to distinguish between those letters we still have to identify and those that have already been established.

       PCQ VMJiPD LhiK LiSe KhahJaWaV haV ZCJPe EiPD KhahJiUaJ LhJee KCPK. CP Lhe LhCMKaPV aPV liJKL PiDhL, QheP Khe haV ePVeV Lhe LaRe CI Sa’aJMI, Khe JCKe aPV EiKKev Lhe DJCMPV ZelCJe hiS, KaUiPD: “DJeaL EiPD, ICJ a LhCMKaPV aPV CPe PiDhLK i haNe ZeeP JeACMPLiPD LC UCM Lhe laZReK CI FaKL aDeK aPV Lhe ReDePVK CI aPAiePL EiPDK. SaU i SaEe KC ZCRV aK LC AJaNe a laNCMJ CI UCMJ SaGeKLU?”

       eFiRCDMe, LaReK IJCS Lhe LhCMKaPV aPV CPe PiDhLK

      This simple step helps us to identify several other letters, because we can guess some of the words in the ciphertext. For example, the most common three-letter words in English are the and and, and these are relatively easy to spot – Lhe, which appears six times, and aPV, which appears five times. Hence, L probably represents t, P probably represents n and V probably represents d. We can now replace these letters in the ciphertext with their true values:

       nCQ dMJinD thiK tiSe KhahjaWad had ZCJne EinD KhahJiUaJ thJee KCnK. Cn the thCMKand and MJKt niDht, Qhen Khe had ended the taRe CI Sa’aJMI, Khe JCKe and EiKKed the DJCMnd ZelCJe hiS, KaUinD: “DJeat EinD, ICJ a thCMKand and Cne niDhtK i haNe Zeen JeACMntinD tC UCM the laZReK CI FaKt aDeK and the ReDendK CI anAient EinDK. SaU i SaEe KC ZCRd aK tC AJaNe a laNCMJ CI UCMJ SaGeKtU?”

       eFiRCDMe, taReK IJCS the thCMKand and Cne niDhtK

      Once a few letters have been established, cryptanalysis progresses very rapidly. For example, the word at the beginning of the second sentence is Cn. Every word has a vowel in it, so C must be a vowel. There are only two vowels that remain to be identified, u and o; u does not fit, so C must represent o. We also have the word Khe, which implies that K represents either t or s. But we already know that L = t, so it becomes clear that K = s. Having identified these two letters, we insert them into the ciphertext, and there appears the phrase thoMsand and one niDhts. A sensible guess for this would be thousand and one nights, and it seems likely that the final line is telling us that this is a passage from Tales from the Thousand and One Nights. This implies that M = u, I = f, J = r, D = g, R = I and S = m.

      We could continue trying to establish other letters by guessing other words, but instead let us have a look at what we know about the plain alphabet and cipher alphabet. These two alphabets form the key, and they were used by the cryptographer to perform the substitution that scrambled the message. Already, by identifying the true values of letters in the ciphertext, we have effectively been working out the details of the cipher alphabet. A summary of our achievements, so far, is given in the plain and cipher alphabets below.

      

      By examining the partial cipher alphabet, we can complete the cryptanalysis. The sequence VOIDBY in the cipher alphabet suggests that the cryptographer has chosen a keyphrase as the basis for the key. Some guesswork is enough to suggest the keyphrase might be A VOID BY GEORGES PEREC, which is reduced to AVOIDBYGERSPC after removing spaces and repetitions. Thereafter, СКАЧАТЬ