Select random rows with sowing

Using SQL Server, I have a table about 5.5 million rows in size, and I want to randomly select a set of 120 rows that match certain criteria.

Something related to Select n random rows from the SQL Server table and https://msdn.microsoft.com/en-us/library/cc441928.aspx , but my problem is that I want to be able to sow this so that I can randomly select the same 120 rows sequentially and then get another random set of rows if I use a different seed.

I could do something like this in my application:

var rand = new Random(seed); var allExamples = db.myTable.Where(/*some condition*/).ToList(); var subSet = db.myTable.Select(x => new { x, r = rand.NextDouble()) .OrderBy(x => xr) .Take(120) .Select(x => xx).ToList(); 

Which works, but, as you might have guessed, with 5.5 million lines slower. So I'm really looking for a way to do this work on the SQL server side, so I don’t need to retrieve and process all the rows.

+7
c # sql-server
source share
2 answers

If you want something that looks random, then mix your [PrimaryKey] with some other data ...

 SELECT * FROM [your table] ORDER BY CHECKSUM([primarykey]) ^ CHECKSUM('your seed') 

... this would still be a table scan, but it should have better performance, and then pull out the entire dataset so that your client just throws away all but 120 rows.

+2
source share

Assign a unique pointer to each row as an indexed column (but not a clustered one). Choose three alphanumeric characters at random, and then select all the lines where the guid begins with these characters.

 36*36*36=46,656 5,500,000/46,656 ~= 117.88 

Since newid() does not match the pattern that provides your random grouping evenly distributed, and if you use the same three characters, you will always get the same data that covers the crop.

If this is not efficient enough, create another column to index the first three characters.

(Sorry for the brevity - on my phone)

0
source share

All Articles