Since the article mentions finding duplicate symbols which corresponded to duplicate letters in words, it's probably a simple substitution cipher. Such codes can usually be broken with a combination of frequency analysis, and guesswork.
This one was apparently made more difficult by the fact that every other symbol was random. (And apparently using some symbols that did not otherwise appear in the code.)
For that you simply need to have the sudden inspiration of how the scheme works. Snark aside, I don't think that discarding half of the source material as random junk is such an obvious thing to do.
You could easily think, 'hey perhaps the outside triangles are different from the inside ones, lets make a histogram of the symbols in both.'
From there, you'd certainly notice if one side had a lot more symbols than the other side.
Trying to analyze both separately is a decent next step, and we are well along to solving this.
That's not something a normal human would easily think. I doubt an experienced God breaker would have, easily thought your suggestion.
But don't worry, go on telling yourself it's something you would have easily thought.
Certainly seems that way. How I read "Noticing some repeated pairs of symbols - which represented letters - the first word cracked by GCHQ boffins was Sidebottom's favourite word, "bobbins"."
Though two triangles are used per letter - you can check that with one of the examples which has a message "Why does my nose hurt after concerts?" - 37 characters in total (including spaces), then count the triangles - 74. Hence two inside triangles are used per character.
But certainly highlights how adding noise to any encryption has it's upsides.
That's assuming that the secret is the encryption algorithm itself rather than the key. Modern symmetric encryption does not work that way - the algorithm is public and well known while the key is the actual secret required for encryption/decryption.
I don't see how adding noise in modern encryption can help other than increase the size of the output.
Some modes of operation make use of random noise (IV in CBC, nonce in CTR, etc) because it's a convenient way to get a unique number but it's not for obscurity, it's because it's needed to prevent attacks on these modes.
Look up “confounders”, random noise can be extremely useful if you encrypt it as well. This significantly increases the work required to decrypt (since you’ve got to decrypt random noise as well as signal), makes it much harder to tell if you’ve actually decrypted something successfully (depending on how well you can test the plaintext, obviously) and frustrate correlation attacks because every message has a different payload even if the logical payload was the same.
It’s also UK slang for peadophile, so be careful that everyone understands you are talking about cryptography not sex offenders if you use the word in public as there could be a nasty misunderstanding!
There's a lot of vocabulary like that. I've had conversations about ensuring a daemon reaped zombie children in public before we realized what it sounded like.
That was confusing for me at first when David Cross was sent to the "nonce wing" in The Increasingly Poor Decisions of Todd Margaret. It seemed like an odd situation to start chatting about crypto.
A "nonce word" is a word that is created to be used once. For example, if we were surfing off the coast near the Kruger National Park, we might send this tweet: "We are on Surfari!"
The wikipedia entry for Cryptographic nonce says: "It is similar in spirit to a nonce word, hence the name."
I... don't understand how it works. Isn't the Wikipedia page listing these words essentially invalidate their "hapax legomenon" status?
Also, GP's "Surfari" doesn't sound like a word meant to be used once, but as a word meant to be funny and with high probability of becoming a piece of jargon between a band of friends. My wife & I invent words like these all the time (half of them being born from misspelling or moments of confusions). Are they "nonces" too, even though we keep using them?
A hapax legomenon is a hapax legomenon with reference to a particular corpus. In this context, a "corpus" is a set of words, or more generally a set of works under consideration.
Any corpus of one word is, by construction, composed entirely of of hapax legomena. I think the wikipedia page is fairly clear on the subject, honestly. In general, they're a phenomenon which is fairly obvious and uninteresting.
Where it becomes slightly more interesting is when, in some long text, an author uses a word the no one knows, and doesn't bother to explain it, and never uses it again. It becomes particularly interesting when trying to translate important ancient texts... what the devil did this word really mean?