Python: Is there a good way to check if text is encrypted?

I played with cryptocat , which is an interesting online chat service that allows you to encrypt your messages with a key, so that only people with the same key can read your message. An interesting aspect of the service (in my opinion) is the fact that the text encrypted using a different key than the one you use is displayed simply as "[encrypted]", and not a bunch of garbage encryption text. My question is in Python, is there a good way to determine if a given piece of text is encrypted text? I use RC4 for this example because it was the fastest thing I could implement (based on Wikipedia pseudo-code ..

+4
source share
3 answers

There is no guaranteed way to say, but in practice, you can do two things:

  • check for multiple characters other than ascii (if you expect people to send text in English).

  • check the distribution of values. in plain text, some letters are much more common than others. but in ciphertext all characters are about the same.

an easy way to do the latter is to see if a character occurs more than (N / 256) + 5 * sqrt (N / 256) times (where you have only N characters), in which case it is probably a natural language (unencrypted) .

in python (referring to the logic above to give "true" when encrypted):

def encrypted(text): scores = defaultdict(lambda: 0) for letter in text: scores[letter] += 1 largest = max(scores.values()) average = len(text) / 256.0 return largest < average + 5 * sqrt(average) 

math comes from the average, which is a Gaussian distribution over the average, with a variance equal to the average - this is not ideal, but probably pretty close. by default (with a small amount of text when it is unreliable), it will return false (sorry, I used to have the wrong version with "max ()", in which the logic for small numbers was wrong).

+11
source

Each cipher that deserves its name will produce an output that seems completely random. You can use this fact for quick testing, whether you are dealing with encrypted text or, rather, with data that follows an unknown protocol. If the data is encrypted, you can check the distribution of byte values ​​in the byte stream, which you can eavesdrop on - if all values ​​are evenly distributed, then there is a good chance that you are dealing with encrypted text.

To gain more confidence in the decision, you can expand the tests to something more complex, for example, analyze the distribution of pairs or triplets of bytes, etc.

On the other hand, you can also compare statistics on the digrams and trigrams of your particular language of interest with the occurrences in the data that you observe (see also here ). If your data behaves similarly, then it is more likely that you are observing plain text.

+4
source

One way to tell is to fill out. Add a standard add-on to the end of the post. If the decrypted message does not end with the standard filling, it is decrypted using the wrong key. The converse is not guaranteed, but often true.

0
source

All Articles