Assign a unique pointer to each row as an indexed column (but not a clustered one). Choose three alphanumeric characters at random, and then select all the lines where the guid begins with these characters.
36*36*36=46,656 5,500,000/46,656 ~= 117.88
Since newid() does not match the pattern that provides your random grouping evenly distributed, and if you use the same three characters, you will always get the same data that covers the crop.
If this is not efficient enough, create another column to index the first three characters.
(Sorry for the brevity - on my phone)
Bobson
source share