Murmurhash 2 Results in Python and Haskell

Haskell and Python do not seem to agree with the results of Murmurhash2. Python, Java, and PHP returned the same results, but Haskell does not. Am I doing something wrong regarding Murmurhash2 at Haskell?

Here is my code for Haskell Murmurhash2:

import Data.Digest.Murmur32 main = do print $ asWord32 $ hash32WithSeed 1 "woohoo" 

And here is the code written in Python:

 import murmur if __name__ == "__main__": print murmur.string_hash("woohoo", 1) 

Python returned 3650852671 while Haskell returned 3966683799

+8
python haskell hash murmurhash mismatch
source share
2 answers

The murmur-hash package (I am its author) does not promise to calculate the same hashes like other languages. If you rely on hashes to be compatible with other software that calculates hashes, I suggest you create newtype wrappers that calculate hashes the way you want. For text, in particular, you need to at least specify the encoding. In your case, you can convert the text to an ASCII string using Data.ByteString.Char8.pack , but this still does not give you the same hash, since the ByteString instance ByteString more likely a placeholder.

By the way, I am not actively improving this package because MurmurHash2 has been replaced by MurmurHash3, but I continue to accept corrections.

+3
source share

From quick source control, it seems like the algorithm runs 32 bits at a time. The Python version gets them by simply grabbing 4 bytes at a time from the input string, and the Haskell version converts each character into a single 32-bit Unicode index.

Therefore, it is not surprising that they give different results.

+5
source share

All Articles