I tested 2 set-based randomization methods for RAND (), generating 100,000,000 lines each. To level the field, the output is a float between 0-1 and simulate RAND (). Most of the code is a testing framework, so I generalize the algorithms here:
-- Try
Using CRYPT_GEN_RANDOM was by far the most random, since there is only a chance .000000001% can see even 1 duplicate when it plucks 10 ^ 8 numbers from a set of 10 ^ 18 numbers. IOW we should not have seen duplicates, and this was not! This kit took 44 seconds to generate on my laptop.
Cnt Pct ----- ---- 1 100.000000 --No duplicates
SQL Server Runtime: CPU time = 134795 ms, elapsed time = 39274 ms.
IF OBJECT_ID('tempdb..#T0') IS NOT NULL DROP TABLE
Almost 15 orders of magnitude less than random, this method was not quite twice as fast, in just 23 seconds, to generate 100 M numbers.
Cnt Pct ---- ---- 1 95.450254 -- only 95% unique is absolutely horrible 2 02.222167 -- If this line were the only problem I'd say DON'T USE THIS! 3 00.034582 4 00.000409 -- 409 numbers appeared 4 times 5 00.000006 -- 6 numbers actually appeared 5 times
SQL Server Runtime: CPU time = 77156 ms, elapsed time = 24613 ms.
IF OBJECT_ID('tempdb..#T1') IS NOT NULL DROP TABLE
Only RAND () is useless for set-based generation, so generating a baseline for comparing randomness takes more than 6 hours and needs to be restarted several times to finally get the correct number of output lines. It also seems that randomness leaves much to be desired, although it is better than using a checksum (newid ()) to re-feed each line.
Cnt Pct ---- ---- 1 99.768020 2 00.115840 3 00.000100 -- at least there were comparitively few values returned 3 times
Due to restarts, runtime cannot be captured.
IF OBJECT_ID('tempdb..#T2') IS NOT NULL DROP TABLE