I will investigate whether matching two sets of software is possible on a sequence of generated pseudo random numbers. I am interested in understanding all the possible points of divergence, as I am in finding a way to reconcile them.
Why? I work in a data store that uses many different software packages (Stata, R, Python, SAS, possibly others). There has recently been interest in QCing findings by replicating processes in another language. For any process that includes random numbers, it would be useful if we could provide a series of steps ("set this parameter", etc.) that allow two packages to be consistent. If this is not feasible, I would like to be able to articulate where the points of failure are.
A simple example:
Both default random number generators R and Python are Mersenne-Twister. I put them in the same seed and tried to try and also look at the "state" of the PRNG. None of the values ββare consistent.
R (3.2.3, 64-bit):
set.seed(20160201) .Random.seed sample(c(1, 2, 3, 4, 5))
Python (3.5.1, 64-bit):
import random random.seed(20160201) random.getstate() random.sample([1, 2, 3, 4, 5], 5)
python random r
ElizabethAB
source share