Does the random number generator plow a reasonable hash function many times?

I want to generate a large amount of random data that is reproduced for a given key containing a list of numbers:

 [a, b, c, d, e, ...] 

It is the next good or reasonable way to get RNGs in a position to generate random data in such a way that for each n-tuple [a, b, c, ..., n] this data is not correlated with the output for "adjacent" n -tuples [a+1, b, c, ..., n] , [a, b+1, c, ..., n] , etc.

 srand(a); srand(rand() * b); srand(rand() * c); ... srand(rand() * n); # generate random data: for (int i=0; i < 100; +i) printf("%d", rand()); 

I think this question boils down to the following: is rand_hash good hash function for a 2-tuple (a, b) ?

 int rand_hash(int a, int b) { srand(a); srand(rand() * b); return rand(); } 

NB: I do not want to mean that srand and rand are any specific RNG implementation. Assume for the sake of argument that we are using good Mersenne Twister code.

Change If this is not clear, by “reasonable hash function” I mean the following. In the limited case of a 2-tuple [a, b] , the output of rand_hash should be uniform in the range of int , and (as a rule) there should be no correlation between the magnitude of the change in a or b and the magnitude of the change in the return value.

+7
source share
3 answers

No, this is not a reasonable approach.

  • You do not know what the rand implementation is. Random number generators are designed to provide approximately evenly distributed numbers over a period of several mnumbers generated. They are not intended to provide evenly distributed numbers across multiple (32-bit) seeds. In your hypothetical case, the mersenne_twister random number generator has a state much larger than the integer that you point to srand (in particular, 624*sizeof(int) ). Most of the power that the RNG must provide for its output to be random and uniform from this additional state, and you removed it. (Seed can only be one of 2 ^ 32 states)
  • If you have ever upgraded your compiler or libraries or something like that, anything you could serialize to disk would become unreadable. (If rand is a black box, no one says that tomorrow the implementation matches today).
  • The output of the hash function returns the same for the same inputs on srand . Therefore, you already have a hash - the srand entry. The RNG generates the same output for a given srand entry. Therefore, the number of hashes that you can get is no more than just returning the hash that you have already calculated. If your initial hash in srand has a poor distribution for the hash table, then scale the hash accordingly so that it performs well in your table.
  • For some rand implementations, this does extremely poorly. Consider a linear congruent generator (which is more common with C libraries because it has the sizeof(int) state - for example, the BSD generator ). LCG follows the form xNext = a*xCurrent + b . Consider:

     static int seed = 0; void srand(int newSeed) { seed = newSeed; } int rand() { seed = (int) ((1103515245 * ((unsigned int)seed) + 12345) & 0x7fffffffUL); return seed; } 

    Note that this (generic) type of generator generates hash values ​​that easily correlate with your input values.

+9
source

How about using boost::hash_combine http://www.boost.org/doc/libs/1_33_1/doc/html/hash_combine.html to create an initial seed? Using srand more than once always causes red flags to be remembered.

+2
source

Potential issue:

What if another thread calls rand() in the middle of your hash function?

+1
source

All Articles