Difference between Python 2 and 3 for shuffling with a given seed

I am writing a program compatible with both Python 2.7 and 3.5. Some parts of it rely on a random process. My unit tests use arbitrary seed, which leads to the same results in all executions and languages ​​... except for code using random.shuffle .

An example in Python 2.7:

 In[]: import random random.seed(42) print(random.random()) l = list(range(20)) random.shuffle(l) print(l) Out[]: 0.639426798458 [6, 8, 9, 15, 7, 3, 17, 14, 11, 16, 2, 19, 18, 1, 13, 10, 12, 4, 5, 0] 

The same input in Python 3.5:

 In []: import random random.seed(42) print(random.random()) l = list(range(20)) random.shuffle(l) print(l) Out[]: 0.6394267984578837 [3, 5, 2, 15, 9, 12, 16, 19, 6, 13, 18, 14, 10, 1, 11, 4, 17, 7, 8, 0] 

Note that the pseudo-random number is the same, but the shuffled lists are different. As expected, re-executing cells does not change their output.

How can I write the same test code for two versions of Python?

+7
source share
3 answers

In Python 3.2, the random module was reorganized a bit to make the output uniform across architectures (given the same seed), see issue # 7889 , The shuffle() method was switched to using Random._randbelow() .

However, the _randbelow() method _randbelow() also been adjusted, so just copying version 3.5 of shuffle() not enough to fix it.

However, if you pass in your own random() function, the implementation in Python 3.5 will not change from version 2.7 and, thus, will circumvent this limitation:

 random.shuffle(l, random.random) 

Please note, however, that you are now exposed to the old 32-bit and 64-bit architecture differences that you tried to solve. # /

Ignoring a few optimizations and special cases, if you include _randbelow() , version 3.5 can be referred to as:

 import random import sys if sys.version_info >= (3, 2): newshuffle = random.shuffle else: try: xrange except NameError: xrange = range def newshuffle(x): def _randbelow(n): "Return a random int in the range [0,n). Raises ValueError if n==0." getrandbits = random.getrandbits k = n.bit_length() # don't use (n-1) here because n can be 1 r = getrandbits(k) # 0 <= r < 2**k while r >= n: r = getrandbits(k) return r for i in xrange(len(x) - 1, 0, -1): # pick an element in x[:i+1] with which to exchange x[i] j = _randbelow(i+1) x[i], x[j] = x[j], x[i] 

which gives you the same result on 2.7 as 3.5:

 >>> random.seed(42) >>> print(random.random()) 0.639426798458 >>> l = list(range(20)) >>> newshuffle(l) >>> print(l) [3, 5, 2, 15, 9, 12, 16, 19, 6, 13, 18, 14, 10, 1, 11, 4, 17, 7, 8, 0] 
+16
source

Having developed an excellent answer and comments on Martijn Pieters, and on this discussion , I finally found a workaround that may not answer my question, but at the same time does not require profound changes. Summarizing:

  • random.seed actually makes each random function deterministic, but does not necessarily produce the same output in versions;
  • setting PYTHONHASHSEED to 0 disables hash randomization for dictionaries and sets, which by default introduces a non-determinism factor in Python 3.

So, in the bash script that runs the Python 3 tests, I added:

 export PYTHONHASHSEED=0 

Then I temporarily changed my test functions to translate my path into a whole seed that reproduces in Python 3 the results expected in Python 2. Finally, I reverted my changes and replaced the lines:

 seed(42) 

something like that:

 seed(42 if sys.version_info.major == 2 else 299) 

There is nothing to brag about, but as they say, sometimes practicality strikes cleanliness;)

This quick workaround can be useful for those who want to test the same stochastic code in different versions of Python!

+1
source

Someone can fix me if I am wrong, but it seems that the numpy.random module numpy.random not changing between python 2 and 3.

 >>> import numpy as np >>> l = list(range(20)) >>> np.random.RandomState(42).shuffle(l) >>> l [0, 17, 15, 1, 8, 5, 11, 3, 18, 16, 13, 2, 9, 19, 4, 12, 7, 10, 14, 6] 

I got the same result in Python 2.7 (with np 1.12.1) and 3.7 (with np 1.14.5).

The document also claims that the numbers generated must be the same between versions .

Guarantee of compatibility A fixed initial number and a fixed series of calls to RandomState methods using the same parameters will always give the same results, up to a rounding error, unless the values ​​were incorrect. Incorrect values ​​will be fixed, and the version of NumPy in which the correction was made will be indicated in the corresponding line of the documentation. Extending existing parameter ranges and adding new parameters is allowed as long as the previous behavior remains unchanged.

0
source

All Articles