SQL selects a row pattern

I need to select sample strings from a set. For example, if my select query returns x rows, then if x is greater than 50, I want to return only 50 rows, but not only 50, but 50, evenly distributed over the result set. The table in this case records the routes - GPS + DateTime locations. I order a DateTime and need a reasonable selection of latitude and longitude values. thanks in advance [SQL Server 2008]

+4
source share
4 answers

To get row samples in SQL Server, use this query:

SELECT TOP 50 * FROM Table ORDER BY NEWID(); 

If you want to get every nth line (10th, in this example), try this query:

 SELECT * From ( SELECT *, (Dense_Rank() OVER (ORDER BY Column ASC)) AS Rank FROM Table ) AS Ranking WHERE Rank % 10 = 0; 

A source

Additional examples of queries that select random strings for other popular RDBMS can be found here: http://www.petefreitag.com/item/466.cfm

+3
source

Every nth line to get 50:

 SELECT * FROM table WHERE row_number() over() MOD (SELECT Count(*) FROM table) / 50 == 0 FETCH FIRST 50 ROWS ONLY 

And if you need a random sample, go with jimmy_keen's answer.

UPDATE: Regarding the requirement for it to run on MS SQL, I think it should be changed to this (there is no MS SQL Server for testing):

  SELECT TOP 50 * FROM ( SELECT t.*, row_number() over() AS rn, (SELECT count(*) FROM table) / 50 AS step FROM table t ) WHERE rn % step == 0 
+3
source

I suggest that you add the calculated column to your result set when choosing which is obtained as a random number, and then select the top 50, sorted by this column. This will give you a random sample.

For instance:

 SELECT TOP 50 *, RAND(Id) AS Random FROM SourceData ORDER BY Random 

where SourceData is your source data table or view. This assumes, by the way, T-SQL on SQL Server 2008. It also assumes that you have an identifier column with unique identifiers in your data source. If your identifiers are very low, it’s useful to multiply them by a large integer before passing them to RAND, for example:

 RAND(Id * 10000000) 
+1
source

If you need a statically correct sample, tablesample is the wrong solution. A good solution described here based on Microsoft Research is to create a materialized view of your table that includes an additional column, for example CAST (ROW_NUMBER () OVER (...) AS BYTE) AS RAND_COL_, then you can add an index to this column as well as other interesting columns and get statistically correct samples for your queries pretty quickly. (using WHERE RAND_COL_ = 1).

0
source

Source: https://habr.com/ru/post/1313306/


All Articles