Can a deterministic hash function be easily decrypted?

Possible duplicates:
Is it possible to decrypt md5 hashes? Can sha1 be undone?

I asked this question: working with huge spreadsheets

and got a great answer, and I followed the advice. I used this: http://splinter.com.au/blog/?p=86

and I hashed about 300,000 different items in an Excel table column

as you can do:

=SHA1HASH('The quick brown fox jumps over the lazy dog') 

And you will return:

 2fd4e1c67a2d28fced849ee1bb76e7391b93eb12 

Could you go back?

im if it encrypts the same text the same every time, what is the point?

If you know the hash algorithm, is it possible to go back?

Could you explain to me very simply how hashing works? how can you convert 20 gb into a hash file with 40 characters? How long does it take to hash a 20 GB hard drive?

+7
security algorithm encryption hash
source share
11 answers

I understand your point of view based on the fact that you are trying to hide social security numbers. If someone knows that you are using SHA1HASH in the SSN to create a unique identifier, you can simply generate a quick list of all SSNs, SHA1HASH, and then compare to automatically include the person’s SSN in the records. Worse, they can generate all this in a hash lookup table and have a 1 hash key for each SSN. This is called a hash lookup table, and more complex forms are called rainbow tables.

This is why the second hashing feature was invented. This is called salting. Salting is basically it; You create a salt, then modify your data using the salt. For example, let's say you had SSN 123-45-6789. You could salt it with the sequence "MOONBEAM". Your new hash line is "123-45-6789MOONBEAM"

Now, even if someone knows that you have hashed an SSN to generate your unique identifier, he still does not know what salt you will use, and therefore cannot get the original SSN, having previously hashed the list of all SSNs and compared to your identifier . However, you can always take a custom SSN, use salt, and rephrase SSN + SALT to check if the user SSN matches its identifier.

Finally, if you use only 1 salt for everything and keep it a secret, instead of seeing the salt and generating the corresponding SSN, by increasing the SSN + salt 100 million times and choosing a match, they should do a lot more work to find the SSN. This is because 100 million SSNs have a relatively low amount of entropy. (10 ^ 9 combinations). By adding salt and keeping it secret, instead of just starting

 SHA1HASH(111-11-1111) -> check hash match SHA1HASH(111-11-1112) -> check hash match SHA1HASH(111-11-1113) -> check hash match 

They had to run

 SHA1HASH(111-11-1111a) -> check hash match SHA1HASH(111-11-1111b) -> check hash match SHA1HASH(111-11-1111c) -> check hash match ... SHA1HASH(111-11-1111azdfg) -> check hash match SHA1HASH(111-11-1111azdfh) -> check hash match .... SHA1HASH(111-11-1111zzzzzzzzzzzzzzzz) -> check hash match SHA1HASH(111-11-1112a) -> check hash match SHA1HASH(111-11-1112b) -> check hash match 

.. and so on until they finally get to

 SHA1HASH(123-45-6789MOONBEAM) -> check hash match 

at that moment they finally managed to crack SSN + SALT

They don’t even know how many characters in your salt. So 10 ^ times (the number of characters in your salt) they need more to get only 1 SSN, not to mention the whole table.

Update: many years later, I see that my salting information was incorrect when I answered this question. Please see the correct information in the posts and comments below about using unique salts for each entry, as this is still the first entry in the chain. If you think that I should change the OP after reading it, leave a comment below (or add one vote), and if consensus is reached, I will correct it.

+9
source share

General answer

A cryptographic hash function cannot be easily undone. That is why it is also sometimes called a one-way function. No refund.

You should also be careful when invoking this decryption. Hashing is not the same as encryption. The set of possible hash values ​​is usually less than the set of possible inputs, so several inputs are mapped to the same output.

For any hash function specified on the output, you cannot know which of the many inputs was used to generate this particular result.

For cryptographic hashes, such as SHA1, it is very difficult to even find a single input that produces this output.

The easiest way to flip a cryptographic hash is to guess the input and its hash to see if it gives the correct output. If you are mistaken, guess again. Another approach is to use rainbow tables .

Regarding the use of hashing for SSN encryption

In the case of using SSN, an attack is possible due to the relatively small number of possible input values. If you are worried about people accessing an SSN, it’s best not to store or use SSN at all in your application and, in particular, not to use them as an identifier. Instead, you can find or create another identifier, such as an email address, username, GUID, or just an increasing number. You might be tempted to use SSN because it already exists, and at first glance it seems to be a unique, unchanging identifier, but in practice its use simply causes problems. If for some reason you need to store them, use strong non-deterministic encryption with a secret key and make sure that you keep this key safe.

+23
source share

The whole point of the cryptographic hash is that you cannot decrypt it and that it encrypts the same path every time.

A very common use case for cryptographic hashes is password verification. Imagine that I have the password "mypass123" and the hash is "aef8976ea17371bbcd". Then the program or website that wants to verify my password can store the "aef8976ea17371bbcd" hash in its database, not the password, and every time I want to log in, the site or program re-hashes my password and ensures that the hashes coincidence. This allows the site or program to avoid storing my actual password and thus protects my password (if it is used by me elsewhere) in the event that the data is stolen or otherwise compromised - the hacker will not be able to go back from the hash to the password .

Another common use of cryptographic hashes is integrity checking. Suppose that a given file (for example, a Linux distribution CD image) has a well-known publicly available cryptographic hash. If you have a file that should be the same, you can hash it yourself and see if the hashes match. Here, the fact that it hashes the same every time allows you to independently verify it, and the fact that it is cryptographically secure means that no one can really create another, fake file (for example, with a trojan in it) that has same hash.

Remember the very important difference between hashing and encryption: hashing loses information . That is why you cannot go back (decrypt) the hash. You can hash a 20 gigabyte file and end with a 40-character hash. Obviously, this has lost a lot of information in this process. How could you “decrypt” some 40 characters in 20GiB? There is no such thing as compression that works so well! But it is also an advantage, because to check the integrity of a 20 gigabyte file you only need to allocate a hash character with 40 characters.

Since the information is lost, many files will have the same hash, but the key feature of the cryptographic hash (this is what you say) is that, despite the fact that the information is lost, it is impossible to start the calculation with the file and build a second, slightly different file with the same hash. Any other file with the same hash would be radically different and not easily mistaken for the source file.

+17
source share

No, you cannot go back because hashing information is not saved.

You can think of it as a hash function that maps the source text to a single, huge, number. The same number can also be displayed on other texts, although a good hash function will have several collisions:

alt text

If the original message was encrypted, then yes, you can go back.

+7
source share

Encryption and hashing are two different things.

Hashing just translates a string into a number. Encryption saves the contents of the string so that it can later be decrypted. There is no way to get the source string from the hash. Content just doesn't exist.

+5
source share

Not. The hash point is that it is one-way encryption (as others have pointed out, this is not really “encryption”, but stay with me here). The disadvantage is that, theoretically, there is little possibility of “collisions” when two or more lines return the same hash. But usually it's worth it.

+3
source share

A good hash is one way, that is, you should not be able to go back. The point is to provide a row key without expanding the row. For example, this is a good way to match passwords without saving a password. Instead, you store the hash and compare the resulting hash of the inputs.

+2
source share

Not. At least not easy.

SHA1 is still considered cryptographically secure. A hash algorithm is safe if it is easy to calculate in one way, but it is very difficult (exhaustive search) to calculate another path. It is true that every time you encrypt a certain phrase, it will lead to the same hash, but there are endless phrases that will also hash with the same value. Security arises from the fact that you do not know what other phrases are until you run them through the SHA1 function.

+2
source share

No, you cannot return. Count how many different hashes you can have. Now count how many different lines you can have. The first is finite, the second is infinite. There are many (infinitely many, to be precise) rows that have the same sum of SHA1. The fact is that it is very difficult to find two texts that have the same hash.

You can think of hashing as a shorthand. For example, take a hash function that sums all the ASCII codes of letters in a string. You cannot say what happened before the hash, just knowing the sum of the ASCII codes of the letters. This is similar to SHA1, but more complicated.

The hash point is not encryption. A hash point is a reduction of something, so it takes less time to verify that the two things are the same. Now, how can you say that two things are really the same if you know that many things have the same hash? Well, you can’t. You simply assume that it is so rare that this will not happen.

But hashing is not just a check, as equality testing using hashes is usually used only for confirmation / validation and is not deterministic. If you see that the hashes are the same, then based on the parameters of a particular hash function, you can evaluate the likelihood that the hashed objects really match.

And therefore, the fact that the hash function always gives the same results for the same objects is the most important feature of the hash function. It allows you to check and compare objects.

+2
source share

That it encrypts the same text the same way every time is the entire hash point. This is a feature.

If I have a database of password hashes, I can verify that you entered the correct password by hashing it and seeing if the hash matches what I have in the database. But if someone stole my hash database, they won’t be able to figure out what your password is, unless they accidentally stumble upon some plain text that hashes to this value.

+1
source share

In cryptography, this is called a digest. Cryptographically strong digest does not allow you to get the source text based on the value of the digest, without any additional knowledge. The digest value is the same for the same text, so you can calculate the digest of the text and compare it with the published digest. Password checking is a popular application, so you can save a digest instead of a password. This, of course, is prone to a dictionary attack that you have already learned, and therefore it is strongly discouraged to use dictionary words for passwords.

+1
source share

All Articles