Saturday 15:45–16:30 in Audimax

Solving very simple substitution ciphers algorithmically

Stephen Enright-Ward

Audience level:


A historical text may now be unreadable, because its language is unknown, or its script forgotten (or both), or because it was deliberately enciphered. Deciphering needs two steps: Identify the language, then map the unknown script to a familiar one. I’ll present an algorithm to solve a cartoon version of this problem, where the language is known, and the cipher is alphabet rearrangement.


A monoalphabetic substitution cipher (MASC) is the simplest kind of encryption scheme, in which every plaintext character is replaced by exactly one ciphertext character. I'll explain a naive algorithm -- generally known and not original work -- for breaking MASCs using character-level statistics on a training text, under the artificial assumptions that (i) We know the underlying language of the ciphertext message, and (ii) The spaces have been preserved in the ciphertext. Here is a python implementation:

NOTE: This talk is an exposition of old ideas, and contains no original scientific work.

Subscribe to Receive PyData Updates