First, you are not talking about a collision. A collision is when someone finds two different messages whose hash has the same value. Here you are not afraid that someone will find another entry whose hash you publish; indeed, you are afraid that someone will find your contribution. The correct term is prefix attack . Sometimes we say that an attacker tries to “invert” a hash function (find the input corresponding to this output).
There are two ways to search for the pre-image for a given hash value: use the hash function weakness or guess the input by overturning candidates.
There is no known SHA-2 weakness with respect to providence. Come to this, there is no such known weakness for MD5 or even MD4, although these two functions are considered cryptographically decomposed. Therefore, not allowing tremendous success in the scientific research of hash functions, the likelihood that your hash value will not be found through the cryptographic weakness of the hash function.
Attempting candidates may or may not, depending on what the attacker knows about the input. It is quite difficult to accurately model. Suppose, for example, that the input is a single word containing seven letters. There are 26 words 7 = 8031810176. Try all of them with SHA-256, comparing each time with your hash value, it takes several minutes on a recent PC with a naive implementation.
On a more general basis, studying a set of possible inputs is called a dictionary attack, since it is often applied to the problem of recovering a user's password: users are depressingly unimaginable and often choose passwords from a limited set, well, “words”, and it is logical to call a “dictionary” this set of words. We also call it "brute force" or "exhaustive search."
Assuming the dictionary is small enough for the attacker to realistically try all his words, then not only will your hash value be ultimately inverted (if there is sufficient incentive for the attacker), but it also opens the way for <strong cost sharing: the attacker can try to share your computer efforts in several similar attack situations (i.e. several hash values for inversion with the same hash function - again, a common password-related attack model). The main cost-sharing method is to make a pre-computed table : an attacker calculates all the hashes for his dictionary once; then all subsequent hash values can be attacked by simply looking at the hash value in the table. The search is very fast (the attacker sorts his hashes in ascending order). Rainbow tables are a kind of pre-computed table, a smart way that allows you to get a compact view: they allow an attacker to "hold" a large pre-computed table without having to load it on your hard drive. However, rainbow or not, all values in the table (one before compression in the case of the rainbow table) must be calculated at least once by an attacker somewhere, that is, someone can make a full dictionary attack. This has two overheads: processor cost (for calculating all hashes) and storage cost (for storing hash values). Rainbow table makes storage cheaper, but does not improve processor performance.
Salting defeats pre-computed tables (including rainbow tables). This makes dictionaries more bearable. That is, if we assume that inverting one value of the hash function is feasible, then the salt ensures that at least the attacker will have to pay the full cost of the processor each time to attack the dictionary, and he will not be able to share its cost in several attacks or other attackers. Passwords require salting, since in general it was impossible to make universal users select and remember passwords from a sufficiently large set of possible passwords.
It is still much better if your entry from the dictionary is large enough to defeat one forced effort. The size of the value set that your input string can take is important; this set should be evaluated as to what the attacker knows about the data being attacked. For example, if an attacker tries to find the user's password, he knows that the input string is short (users have little patience) and consists only of characters that can be entered (blindly!) On the keyboard; and he also knows that the sequence can be remembered, which makes things like ".% f * (. ds / ~ \ d09j @" absolutely unbelievable. There is no limit on the size of the input, we say that the rainbow tables are limited to "15 characters or so "because users who accept more than 15 characters to enter also select passwords from too large a set to provide the only brute force required to build the table. Note that trying all 15 character sequences is already too many (even the whole sequence 15 lowercase letters mean more than 2 70 hash computations, and it is not really possible with today's technology).