Is there a SQL Server function to display pseudo-random database entries?

There is a way to show a certain number of random records from a database table, but it depends a lot on the date and time of creation.

eg:

  • shows 10 records in random order, but
  • showing the latter with greater frequency than the earliest

  • they say there are 100 entries in the news table

  • the last (by date) entry will have an almost 100% chance of being selected
  • The first record (by date) will have an almost 0% chance of being selected
  • The 50th (by date) entry will have a 50% chance of being selected

Is there such a thing in mssql directly? or is there some kind of feature (best practice) in C # that I can use for this?

Thnx

** edit: the name is really terrible, I know. edit if you have more descriptive. Thnx

+4
source share
5 answers

A fairly simple way might be something like the following. Or at least it can give you a start.

 WITH N AS ( SELECT id, headline, created_date, POWER(ROW_NUMBER() OVER (ORDER BY created_date ASC),2) * /*row number squared*/ ABS(CAST(CAST(NEWID() AS VARBINARY) AS INT)) AS [Weight] /*Random Number*/ FROM news ) SELECT TOP 10 id, headline, created_date FROM N ORDER BY [Weight] DESC 
+4
source

For random sampling, see Limiting Result Sets with TABLESAMPLE . For instance. Select a sample of 100 rows from the table:

 SELECT FirstName, LastName FROM Person.Person TABLESAMPLE (100 ROWS); 

For a weighted selection, with a preference for the most recent entries (I skipped this the first time I read the question), then Martin's solution is better.

+2
source

You can select N using an exponential distribution (for example), as well as SELECT TOP (N), ordered by date, and select the last row. You can select an exhibitor according to the number of existing rows.

0
source

Unfortunately, I do not know MSSQL, but I can give a high-level offer.

  • Get date at UNIX time (or some other incremental integral representation)
  • Divide this value by max for each column to get a percentage.
  • Get a random number and multiply by a percentage higher.
  • Sort the columns by this value and take the top N

This will give more weight to the most recent results. If you want to adjust the relative frequency of old or later results, you can apply an exponential or logarithmic function to the values ​​before accepting the ratio. If you are interested, let me know and I can provide more information.

0
source

If you can filter the results after accessing the database, or you can send a request using order by and process the results with the reader, then you can add a probabilistic bias to the selection. You see that the higher the offset, the more difficult the test inside the if , the more random the process.

 var table = ... // This is ordered with latest records first int nItems = 10; // Number of items you want double bias = 0.5; // The probabilistic bias: 0=deterministic (top nItems), 1=totally random Random rand = new Random(); var results = new List<DataRow>(); // For example... for(int i=0; i<table.Rows.Count && results.Count < nItems; i++) { if(rand.NextDouble() > bias) // Pick the current item probabilistically results.Add(table.Rows[i]); // Or reader.Next()[...] } 
0
source

Source: https://habr.com/ru/post/1315246/


All Articles