Using a .txt file containing random numbers with a set of tests

I have several TXT files containing a large number of integers (about 2.5 million) created by various RNGs. I want to use a test suite to test these RNGs.

The .txt files are as follows:

#============================================== # generator Park seed = 1 #============================================= type: d count: 2500000 numbit: 32 16807 282475249 

Of course, more integers followed. I use the following command to start diehard with this .txt file

 dieharder -f randdata.txt -a - g 202 

My question is: is the .txt file correct (in particular, the first few lines), and why are these lines necessary? The reason I'm asking about this is because every .txt file generated by some RNGs (some good, some bad) fails in almost every test, and I wonder if this is due to any error, which I did by transferring the .txt file towards or if my RNG is just bad.

+6
source share
1 answer

Yes, this input file looks correct. It seems that a number of dieharder tests fail even with 10M inputs generated by dieharder's own generator:

  $ dieharder -o -f example.input -t 10000000 # Generate an input file
 $ head -n 10 example.input
 # ================================================== ==================
 # generator mt19937 seed = 3423143424
 # ================================================== ==================
 type: d
 count: 10000000
 numbit: 32
 2310531048
  808929469
 2423056114
 4237891648
 $ dieharder -a -g 202 -f example.input 
 # ================================================== ============================== #
 # dieharder version 3.31.1 Copyright 2003 Robert G. Brown #
 # ================================================== ============================== #
    rng_name |  filename | rands / second |
      file_input |  example.input |  2.50e + 06 |
 # ================================================== ============================== #
         test_name | ntup |  tsamples | psamples |  p-value | Assessment
 # ================================================== ============================== #
 # The file file_input was rewound 1 times
    diehard_birthdays |  0 |  100 |  100 | 0.07531570 |  Passed  
 # The file file_input was rewound 11 times
       diehard_operm5 |  0 |  1,000,000 |  100 | 0.00000000 |  FAILED  
 # The file file_input was rewound 24 times
   diehard_rank_32x32 |  0 |  40000 |  100 | 0.00047786 |  Wek   
 # The file file_input was rewound 30 times
     diehard_rank_6x8 |  0 |  100,000 |  100 | 0.38082242 |  Passed  
 # The file file_input was rewound 32 times
    diehard_bitstream |  0 |  2097152 |  100 | 0.56232583 |  Passed  
 # The file file_input was rewound 53 times
         diehard_opso |  0 |  2097152 |  100 | 0.83072458 |  Passed  

I don’t know exactly how many samples you need to get β€œbetter” ... but failures with numbers of just 2.5 million seem to be what you might expect.

After some experiments, it seems like the tests start to pass with ~ 120 MB of random binary data:

  $ dd if = / dev / urandom of = / tmp / random bs = 4096 count = 30000
 30000 + 0 records in
 30000 + 0 records out
 122880000 bytes transferred in 10.873818 secs (11300538 bytes / sec)
 $ du -sh / tmp / random
 117M / tmp / random
 $ dieharder -a -g 201 -f / tmp / random
 # ================================================== ============================== #
 # dieharder version 3.31.1 Copyright 2003 Robert G. Brown #
 # ================================================== ============================== #
    rng_name |  filename | rands / second |
  file_input_raw |  / tmp / random |  1.11e + 07 |
 # ================================================== ============================== #
         test_name | ntup |  tsamples | psamples |  p-value | Assessment
 # ================================================== ============================== #
    diehard_birthdays |  0 |  100 |  100 | 0.71230346 |  Passed  
 # The file file_input_raw was rewound 3 times
       diehard_operm5 |  0 |  1,000,000 |  100 | 0.62093817 |  Passed  
 # The file file_input_raw was rewound 7 times
   diehard_rank_32x32 |  0 |  40000 |  100 | 0.02228171 |  Passed  
 # The file file_input_raw was rewound 9 times
     diehard_rank_6x8 |  0 |  100,000 |  100 | 0.20698623 |  Passed  
 # The file file_input_raw was rewound 10 times
    diehard_bitstream |  0 |  2097152 |  100 | 0.55567887 |  Passed  
 # The file file_input_raw was rewound 17 times
         diehard_opso |  0 |  2097152 |  100 | 0.20799917 |  Passed  

Which corresponds to 122,880,000 / 4 = 30,720,000 - approximately about 31M integers.

+8
source

All Articles