What is the effective seed range for Ruby rand?

Ruby implements PRNG as "a modified Mersenne Twister with a period of 2 ** 19937-1." one

The way I understand MT is that it works on 2 ^ 32 different seeds. What confuses me is that Random.new(seed) accepts arbitrarily large numbers, such as Random.new(2**100) .

However, I was not able to find the (logical) collisions:

 Random.new(1).rand(10**5) == Random.new(2**32-1).rand(10**5) => false Random.new(1).rand(10**5) == Random.new(2**32).rand(10**5) => false Random.new(1).rand(10**5) == Random.new(2**32+1).rand(10**5) => false 

Given that we would like to use the maximum MT seed range in the sense that we want to use as many different seeds as possible, while avoiding collisions with two different seeds, what seed range achieves this?

I tried to understand what was going on inside a random Ruby implementation, but didn't go too far. https://github.com/ruby/ruby/blob/c5e08b764eb342538884b383f0e6428b6faf214b/random.c#L370

+8
ruby random random-seed mersenne-twister
source share
1 answer

The Mersenne Twister sequence is long 2 ** ( 624 * 32 - 1 ) - 1 , and the initial value is used to set the internal state for the PRNG, which directly refers to the position within this sequence.

The easiest to search repeat seems to be every 2 ** ( 624 * 32 ) , and it can be shown that it works as follows:

  repeat_every = 2 ** ( 624 * 32 ) start_value = 5024214421 # Try any value r1 = Random.new( start_value ) r2 = Random.new( start_value + repeat_every ) r17 = Random.new( start_value + 17 * repeat_every ) r23 = Random.new( start_value + 23 * repeat_every ) r1.rand == r2.rand # true r17.rand == r23.rand # true 

Or try the following:

  repeat_every = 2 ** ( 624 * 32 ) start_value = 5024214421 # Try any value r1 = Random.new( start_value ) r2 = Random.new( start_value + repeat_every ) Array.new(10) { r1.rand(100) } # => [84, 86, 8, 58, 5, 21, 79, 10, 17, 50] Array.new(10) { r2.rand(100) } # => [84, 86, 8, 58, 5, 21, 79, 10, 17, 50] 

The repetition value refers to how Mersenne Twister works. The internal state of MT is an array of 624 32-bit unsigned integers. The Ruby source code you linked contains Ruby Fixnum in an array - a magic command

  rb_integer_pack( seed, buf, len, sizeof(uint32_t), 0, INTEGER_PACK_LSWORD_FIRST|INTEGER_PACK_NATIVE_BYTE_ORDER ); 

however, this is not so easy to play with, it is defined in internal.h , therefore it is only available if you are working with the Ruby interpreter itself. You cannot access this function from the regular C extension.

The packed integer is then loaded into the MT internal state by the init_by_array function. This is a rather complicated function - the value of the packed seed is not literally written to the state, but instead, the state is generated for each element, adding to the given array values ​​using a lot of xors, additions and cross-references to the previous value (the Ruby source here also adds the packed index to the position array, commented "nonlinear", I think this is one of the reference modifications to the MT standard)

Note that the size of the MT sequence is smaller than 2 ** ( 624 * 32 ) - the repeat_every value that I show here skips 2 sequences at a time, but the easiest way to find a duplicate starting value is because it is easy to see. how it sets the internal state in exactly the same way (since the first 624 elements in the seed array view are identical, and that's all that is used later). Other initial values ​​will also cause the same internal state, but this relation is a complex mapping that combines every integer in the binary space 19938 with another integer that creates the same state for MT.

+10
source share

All Articles