Only you can tell whether 1-2 -32 is good enough or not for your application. The error detection efficiency between CRC-n and n bits from a good hash function will be very close to the same, so choose the one that is faster. This is probably CRC-n.
Update:
The above βThis is probably CRC-n,β it is somewhat likely. This is not the case if very high hash functions are used. In particular, CityHash looks almost as fast as the CRC-32, calculated using Intel crc32 hardware instructions! I tested three CityHash procedures and an Intel crc32 instruction in a 434 MB file. The crc32 instruction version (which calculates the CRC-32C) takes 24 ms of processor time. CityHash64 took 55 ms, CityHash128 60 ms and CityHashCrc128 50 ms. CityHashCrc128 uses the same hardware instruction, although it does not calculate CRC.
To quickly compute CRC-32C calculations, I needed to come up with three crc32 commands for three separate buffers in order to use three arithmetic logic blocks in parallel in one core, and then write the inner loop in assembler. CityHash is pretty damned fast. Unless you have a crc32 instruction, it will be difficult for you to compute a 32-bit CRC as fast as CityHash64 or CityHash128.
Note, however, that CityHash functions will need to be changed for this purpose, or arbitrary selection will be required to determine the consistent value of the CityHash value for large data streams. The reason is that these functions are not configured to receive buffered data, i.e. At the same time, they load the functions and expect to get the same result, as if the entire data set was immediately supplied to the function. CityHash functions must be changed to update an intermediate state.
An alternative and what I did for quick and dirty testing is to use the Seed versions for functions in which I would use CityHash from the previous buffer as a seed for the next buffer. The problem is that the result depends on the size of the buffer. If you load buffers with different sizes of CityHash with this approach, you get different hash values.
Another update four years later:
The xxhash family is even faster. I would recommend this for CRC for a non-critical hash.
Mark adler
source share