knowledge - Guide to Codes and Ciphers

^{Note: This is a Guide not a Puzzle}

ciphers are quite common now on puzzling, and at first can seem quite confusing. But ciphers are bigger than this site, used worldwide by companies and secret services to encrypt data.

But what is a cipher, and what's the difference between a code and a cipher? What types of codes and ciphers are there and how can I make or solve them?

(The main purpose of this post is to help out newcomers to the site who may be a bit daunted or confused at the sight of ciphers, but who knows? An experienced user could learn something here too, I certainly did in my research :) )

Answer

This guide aims to explain various ciphers, help you understand how they work, and how to decode them with or without a key.

This answer is currently being split into multiple posts to improve scrollability and readability after some advice from other users. This may take a while, and apologies for the stop-start fashion of it.

Mission accomplished! This answer now contains links to separate posts of different types of ciphers, so there is no character limit allowing me to elaborate in more detail and to stop you having to scroll. Thanks a lot to @n_palum for helping!

Index:

What is a cipher?
- Brief History
- Definition
- How to make a good one
- Difference between Codes and Ciphers

Types of cipher
- Classes and definitions
- Transposition ciphers
- Monoalphabetic Substitution ciphers
- Polygraphic Substitution ciphers
- Polyalphabetic ciphers
- Other ciphers
- Mechanical Ciphers

Cryptanalysis
- Frequency Analysis
- Index of Coincidence
- Kasiski Examination

Resources

Brief History

Ciphers have played major parts in historical events dating back to around 1900 BCE where apparent nonsense hieroglyphics can be found. From there, ciphers have developed, a recipe found encrypted on a tablet from 1500 BCE, and Hebrew scholars using monoalphabetic ciphers in 600 BCE. Nowadays, ciphers are common, encryption used by companies, secret services and even everyday applications such as Whatsapp. They make the world a lot more secure, but what actually are these ciphers?

Definition

A cipher is, simply put, a way of hiding data using a disguised way of writing. It is usually an algorithm with the purpose of converting data to a code to stop outside parties from obtaining the data and allowing only the intended recipient access.

A cipher consists of at least two, often 3 pieces of data:

The plaintext - the message or data which shall be encoded

The key (Not used for all ciphers) - A piece of data which is required to decode the ciphertext to the plaintext

The ciphertext - the encoded plaintext which is usually illegible

The process of encryption is

Plaintext -> Method of encryption (type of cipher) + Key (if required) -> Ciphertext

Decryption is the reverse.

How to make a good one

On puzzling, we don't want to just see a short string and be expected to solve it. For what to do and what not to do see this meta post

Difference between a Code and a Cipher

For everyone but cryptographers, the words code and cipher are synonymous. If you were to talk about codes and ciphers to someone you'd probably find they used the words interchangeably. But there is a difference.

Codes are everywhere, and you won't even notice the most of the time. A code replaces words or entire sentences or phrases with symbols or characters. The important thing here is that each set of symbols or characters have a meaning. These meanings are usually stored in a code book. For instance, telegraph communicators used code to convey messages quicker, here is an extract of one of their codebooks:

You can see different words on their own can mean whole sentences.

Codes are very common, and you use them without even thinking. A traffic light uses a colour code for the words 'stop', 'wait' and 'go'. Most people use code every day, probably including you, whilst talking in chat or texting things like 'brb', 'afaik' and 'idk'. The most common code, used for information interchange, is ASCII.

The point of codes isn't really to hide data, just converting it to an easier way to transmit.

A cipher, on the other hand, the ciphertext has no meaning whatsoever. Each character is replaced according to an algorithm. For instance, Morse code isn't a code, it's actually a cipher.

Most ciphers were invented to hide data.

The difference broken down:

Codes generally operate on semantics, meaning, while ciphers operate on syntax, symbols. A code is stored as a mapping in a codebook, while ciphers transform individual symbols according to an algorithm.

Classes and definitions

There are two different categories of ciphers: Classical (pen and paper) and the more modern Mechanical (requires a machine).

There are several different classes of classical ciphers, as listed below:

Transposition ciphers - Positions of the characters in the plaintext change, but the characters themselves remain the same

Monoalphabetic substitution ciphers - Each character (not always true, but most) is replaced with a different character(s)

Polygraphic substitution ciphers - Groups of characters are replaced

Polyalphabetic ciphers - Characters are encoded using a different alphabet. Usually position dependent.

Others - Completely different, or above classes are combined

There are a few mechanical ciphers, which I will write a brief note on after the classical ciphers below.

Transposition ciphers

Transposition ciphers involve moving the characters in the plaintext to different positions using an algorithm. The characters themselves remain unchanged, making this type of cipher insecure for short plaintexts.

See this separate answer for more details on different types of transposition ciphers.

Monoalphabetic substitution ciphers

Monoalphabetic substitution ciphers replace each letter in the plaintext with a different character/group of characters. If the plaintext is lengthy then these can be easily broken by frequency analysis.

See this separate answer for more details on different types of monoalphabetic substitution ciphers.

Polyalphabetic Substitution Ciphers

Polyalphabetic Substitution ciphers involve replacing characters in the plaintext with characters/groups of characters from an alternate alphabet.

See this separate answer for more details on different types of polyalphabetic substitution ciphers.

Polygraphic Ciphers

Polygraphic ciphers involve having groups of characters in the plaintext replaced.

See this separate answer for more details on different types of polygraphic ciphers.

Other ciphers

Other ciphers are out there and many don't fit into any of the above categories. They can be combination ciphers, combining elements above to make them stronger, or just be completely different.

See this separate answer for more details on different types of other ciphers.

Also, see this community wiki of other ciphers that have been missed out, and feel free to add to it!

Mechanical ciphers

Mechanical ciphers were invented in WWII. They rely on gearing mechanisms to shift letters through an alphabet to get the final message.

Most famous examples are the Enigma machine and the Lorenz machine. I won't be able to explain a machine very well, so I won't bother going into detail. See the links for more, or this list in Wikipedia.

There are many ways to attempt to break a cipher without a key. Here are the best ways (taken from my answer here):

Cryptanalysis is defined as

'the art or process of deciphering coded messages without being told the key.'

If you have the key and know the encryption method, you can simply reverse the process to get to the plaintext.

If you have the key but not the encryption method, then this question covers how you can identify the cipher

However, if you have neither the key nor the encryption then you can use cryptanalysis.

This can be used to achieve a

Total break — working out the key and the plaintext.

Global deduction — discovering the method of encryption and finding the plaintext, but not the key.

Distinguishing algorithm — identifying the cipher from a random permutation.

There are a couple of different ways to solve ciphers:

Frequency Analysis

Frequency analysis works best with substitutional or rotational ciphers, though both of those can have keys. Frequency analysis studies the frequency of letters in a ciphertext.

Computers have calculated that in the English language, the order of the most frequent letters from high to low is etaoinshrdlcumwfgypbvkjxqz.

Here is the stats for analysis on the English language, including unigram, bigrams, trigrams etc.

As you can see from this graph, 'e' is by far the most frequent letter. 't' - 'r' is a lot closer.

How to use

If the cipher is a substitution, and the ciphertext is quite large, then you can attempt to break the cipher.

Using an online tool such as this, you can find the most common letters and most frequent substrings.

The most frequent letter in the ciphertext is probably 'e', and so on.

Using this you can break a cipher, or get an almost correct plaintext which you can then deduce the correct plaintext.

Example

Example found online. This is a known rot cipher, but we don't know what number:

ymnxhtzwxjfnrxytuwtanijdtzbnymijyfnqjipstbqjiljtknrutwyfsyyjhmstqtlnjxfsifuuqnhfyntsymfyfwjzxjinsymjnsyjwsjy

Most common letters:

j = 13, y=13, n=11, t=10.

so we can assume either e = j or y. If e = j, then j is +5 from e so we can assume this is rot 5. Decoding using rot 21 (the reverse) gives:

thiscourseaimstoprovideyouwithdetailedknowledgeofimportanttechnologiesandapplicationthatareusedintheinternet

So we have solved it using just one substitution.

This method really works best with a quite lengthy ciphertext and is almost useless with short ciphertexts.

Index of coincidence

The index of coincidence provides a measure of how likely it is to draw two matching letters by randomly selecting two letters from a given text, from the formula number of times that letter appears/length of the text

The calculation itself is complex. Here is the calculation, in its most basic form from Wikipedia.

How to use

The basis is that by splitting the ciphertext into groups of x, and stacking them, if the key length = x then the I.C. will be around 1.73 (index coincidence of English language). If it isn't the same as x it will be around 1.

Example

(From Wikipedia)

We have the following ciphertext:

QPWKA LVRXC QZIKG RBPFA EOMFL JMSDZ VDHXC XJYEB IMTRQ WNMEA IZRVK CVKVL XNEIC FZPZC ZZHKM LVZVZ IZRRQ WDKEC HOSNY XXLSP MYKVQ XJTDC IOMEE XDQVS RXLRL KZHOV

We can guess this is vigenere with a short key and its English. We can stack them in, say groups of 3 or any other number:

QPW
KAL
...

So if the key length is x, then the I.C should be around 1.73. Calculating all key lengths of 1-10:

We can see that 5 and 10 are the closest to 1.73, and as 10 is a factor of 5 then the key length will be 5.

Next stack the ciphertext in groups of 5, and using frequency analysis on each column we can find the key. When we try this, the best-fit key letters for each column are "EVERY". A vigenere decoder gives the message:

MUST CHANGE MEETING LOCATION FROM BRIDGE TO UNDERPASS SINCE ENEMY AGENTS ARE BELIEVED TO HAVE BEEN ASSIGNED TO WATCH BRIDGE STOP MEETING TIME UNCHANGED XX

Kasiski Examination

The Kasiski Examination is another way of deducing the key length. Works best with longer ciphertexts, though a computer is then usually required.

The Kasiski Examination finds the repeated strings in the ciphertext and the distance between them. The distances are likely to be multiples of the keyword length. Finding more repeated strings means it is easier to find the key length, as it is the highest common factor/greatest common divisor of the distances.

Example

(Courtesy of wikipedia, with some added elaboration.)

Take the plaintext

cryptoisshortforcryptography

'crypto' appears twice in the plaintext, the distance between is 16 characters. (Count from the first c to the r before the second)

If the key is 'abcdef' the length is 6, which doesn't go into 16 we don't get any repeats in the ciphertext:


abcdefabcdefabcdefabcdefabcdefab
cryptoisshortforcryptography
csasxtitukswtgqugwyqvrkwaqjb

'abcdef' matches 'crypto' the first time, but for the second crypto the key is 'efabcd' and as a result, the ciphertext doesn't match.

But if the key is 'abcd', the length is 4 which goes into 16. So the ciphertext repeats:


abcdabcdabcdabcdabcdabcdabcdabcd
cryptoisshortforcryptography
cqwmtngpsgmotemocqwmtneoaofv

You can see that 'abcdab' lines up with 'crypto' both times. And hey presto we get a repeat in the ciphertext: 'cqwmtn'.

Blog

Friday, June 22, 2018