How to reverse engineer an algorithm?

I am wondering how to make a reverse algorithm, such as one for storing logins or PIN codes.

Suppose I have a quantity of data, where:

7262627 -> ? -> 8172 5353773 -> ? -> 1132 

etc .. This is just an example. Or say a hexadecimal string converted to another.

&h8712 -> &h1283 or something like that.

How do I start figuring out what kind of algorithm it is? Where to begin?

Could you try different shifts, xors and hope that something stands out? I am sure there is a better way, since it looks like he is trembling in the dark.

Is it even possible to reconstruct such an algorithm?

Sorry if this is a stupid question. Thanks for your help / pointers.

+6
algorithm cryptography encryption reverse-engineering
source share
4 answers

There are several things that people try to do:

  • Get the source code or disassemble the executable file.
  • Guess based on the hash functions that other people use. For example, a hash consisting of 32 hexadecimal digits can be one or more repetitions of MD5, and if you can get one pair of input / output data, then it is quite easy to confirm or refute (although see "Salt" below).
  • To statistically analyze a large number of pairs of inputs and outputs, look for some pattern or correlations and associate these correlations with the properties of known hash functions and / or possible operations that the system designer could use. This is beyond the scope of a unified technique in the field of general cryptanalysis.
  • Ask the author. Secured systems usually do not rely on the secrecy of the hashing algorithms that they use (and usually do not remain safe if they do). However, the examples you give are pretty small, and secure password hashing will always include salt, which apparently doesn't. Therefore, we may not talk about the system in which the author is confident in this.

In the case of a hash where the output consists of only 4 decimal digits, you can attack it simply by building a table of all possible 7-digit input data along with your hashed value. Then you can flip the table, and you have (one-to-many) de-hashing. You never need to know how the hash is calculated. How do you get input / output pairs? Well, if the outsider can somehow indicate the value that needs to be hashed and see the result, then you will have what is called the “selected plaintext”, and the attack based on this is the “selected plaintext attack”. Thus, a 7-digit → 4-digit hash would be very weak if it were used in a way that allowed the selected plaintext attacks to generate many input / output pairs. I understand that there is only one example, but it is also just one example of a technique to reverse it.

Note that reverse engineering a hash and actually changing it is two different things. You can understand that I am using SHA-256, but that would not help you cancel it (i.e. Get the result, work out the input value). No one knows how to completely cancel SHA-256, although, of course, there are always rainbow tables (see Salt, above). <conspiracy> At least no one recognizes what they are doing, so it is useless to you or me. </conspiracy>

+8
source share

You probably can't. Suppose the transform function is known, something like

 function hash(text): return sha1("secret salt"+text) 

But the "secret salt" is unknown and cryptographically strong (very large, random integer). You could never drag a secret salt from even a very large number of plaintext pairs, crypttext.

In fact, if the exact hash function used were one of two equally powerful functions, you could not even get a good guess between which it was used.

+3
source share

Leggings in the dark will lead you to madness. There are some algorithms that, given current understanding, might not hope to infer the internal work between the present and the [predicted] end of the universe without knowing the exact information (potentially including private keys or internal state). Of course, some of these algorithms are the basis of modern cryptography.

If you know in advance that there is a sample that needs to be opened, sometimes there are ways to get closer to it. For example, if a dataset contains several input values ​​that differ by 1, compare the corresponding output values:

  7262627 -> 8172
 7262628 -> 819
 7262629 -> 1732
 ...
 7262631 -> 3558

It is pretty clear here (given a few minutes and the calculator) that when the input increases by 1, the output increases by 913 modulo 8266 (i.e. a simple linear congruent generator ).

Differential cryptanalysis is a relatively modern technique used to analyze the strength of cryptographic block ciphers, based on a similar but more complex idea, where the encryption algorithm is known, but suggested that the secret key is not. Input blocks that differ from each other by one bit are considered, and the effect of this bit is traced through a cipher to determine how likely each output bit is to "flip".

Other ways to approach this problem would be to look at extremes (maximum, minimum values), distribution (leading to frequency analysis ), direction (numbers always increase? Decrease?) And (if allowed) consider the context in which the data sets were found . For example, some types of PIN codes always contain a repeating number to make them easier to remember (I'm not saying that a PIN code can necessarily be derived from anything else - just repeating a number is another number to worry about!) .

+2
source share

Is it even possible to reconstruct such an algorithm?

It is possible using an incorrect algorithm and sufficiently encrypted / unencrypted pairs, but a well-developed algorithm can eliminate this possibility altogether.

0
source share

All Articles