Calculate a hash of a pattern that does not depend on the order of letters in the pattern (for example, use the sum of the character codes for each letter). Then apply the same hash function in rolling mode to the text, as in Rabin-Karp. If the hashes match, you need to perform a full test of the template in the current window in the text, because the hash may collide with other values.
By associating each character in the alphabet with a prime, then calculating the product of these primes as your hash code, you will have fewer collisions.
However, there is a bit of mathematical trickery that will help you if you want to calculate the running product as follows: each time you step into the window, multiply the current hash code by the multiplicative reverse code for the character exiting the window, and then multiply by the code for character that goes into the window.
As an example, suppose you compute a hash of the letters 'a' - 'z' as an unsigned 64-bit value. Use the table as follows:
symbol | code | code -1
------- + ------ + ---------------------
a | 3 | 12297829382473034411
b | 5 | 14757395258967641293
c | 7 | 7905747460161236407
d | 11 | 3353953467947191203
e | 13 | 5675921253449092805
...
z | 103 | 15760325033848937303
The multiplicative inverse of n is a number that gives 1 when multiplied by n modulo some number. The module here is 2 64 since you are using 64-bit numbers. So 5 * 14757395258967641293 should be 1, for example. This works because you just multiply in GF (2 64 ).
Computing a list of first primes is easy, and your platform should have a library to efficiently calculate the multiplicative inverse of these numbers.
Start coding with number 3, because 2 is compatible with the size of an integer (power 2 on any processor you are working on) and cannot be inverted.
source share