A historical text may now be unreadable, because its language is unknown, or its script forgotten (or both), or because it was deliberately enciphered. Deciphering needs two steps: Identify the language, then map the unknown script to a familiar one. I’ll present an algorithm to solve a cartoon version of this problem, where the language is known, and the cipher is alphabet rearrangement.
A monoalphabetic substitution cipher (MASC) is the simplest kind of encryption scheme, in which every plaintext character is replaced by exactly one ciphertext character. I'll explain a naive algorithm -- generally known and not original work -- for breaking MASCs using character-level statistics on a training text, under the artificial assumptions that (i) We know the underlying language of the ciphertext message, and (ii) The spaces have been preserved in the ciphertext. Here is a python implementation:
NOTE: This talk is an exposition of old ideas, and contains no original scientific work.