Is there a hash function of a string that supports h (x) + h (y) = h (x + y)

I am trying to save space using hash values ​​of strings. I have a very specific requirement, a simplified description of which is as follows:

I have two sets of string values, and a value is provided at runtime. I need to get a list of all rows from the second set, which starts with a line from the first set and ends with the query value. Here is a greatly simplified view and description:

set1: my_test_val_1 my_test_val_2 set2: my_test_val_1_extended_to_another_value my_test_val_2_extended_as_well 

My goal is to store the hash values ​​of these sets, as in:

 set1: hash(my_test_val_1) ... set2: hash(my_test_val_1_extended_to_another_value) 

to save in space and when '_extended_to_another_value' comes as a request, use a hash function with the distribution property over the addition:

 hash(my_test_val_1) + hash('_extended_to_another_value') = hash_value_to_search 

My attempts to find a hash function that supports this property, the failure occurred, most likely due to not using the correct keywords for the search, so even if you can describe the correct conditions for what I describe above, this will help

+5
source share
2 answers

Here is one of them:

 import java.util.Random; public class StringHasher { private static int[] CHAR_HASHES = new int[65536]; static { Random rng = new Random(); for(int k = 0; k < 65536; k++) CHAR_HASHES[k] = rng.nextInt(); } public static int hash(String s) { int result = 0; for(int k = 0; k < s.length(); k++) { result += CHAR_HASHES[s.charAt(k)]; } return result; } } 

It turns out that any such hash should be constructed by adding all the hashes of the string component characters - otherwise, for example, h("hello") = h("h") + h("e") + h("l") + h("l") + h("o") will not be executed.

Note: this means that you cannot have a very collision-resistant hash, since each line containing the same characters will have the same hash in the previous paragraph.

Choosing random values ​​for the hash of each single-character string should, on average, provide the highest possible collision resistance. This takes up 256 KB of memory and is not the fastest method and is not repeated, but enough to prove the concept.

+3
source

You can use some of the basic hashing algorithms and try to crack it using online databases. If x and y are short enough, you can find them in the MDH or SHA database in the online hash crack mode, and if you decrypt it, you can continue your algorithm.

If your application is connected to the network, it can use this approach. The disadvantage is that in some cases with corners, you can get the wrong value, which has the same hash code as the correct one, but the likelihood of this is pretty low.

This is basically a hack, but you do such things with your requirement, so this may be acceptable to you.

Here is an example of an online hash database:

-2
source

Source: https://habr.com/ru/post/1215431/


All Articles