Pulling items from the database with a weighted probability

Say I had a table with full records from which I wanted to extract random records. However, I want some rows in this table to be displayed more often than others (and which ones are changed by the user). What is the best way to do this using SQL?

The only way I can think of is to create a temporary table, fill it with the rows that I want to distribute more, and then put them with other randomly selected rows from the table. Is there a better way?

+4
source share
3 answers

One way I can think of is to create another column in the table that represents the moving sum of your weights, then pull your records, creating a random number between 0 and the total number of all your weights and pull out the row using the highest value of the sum rolling less than a random number.

For example, if you had four lines with the following weights:

+---+--------+------------+ |row| weight | rollingsum | +---+--------+------------+ | a | 3 | 3 | | b | 3 | 6 | | c | 4 | 10 | | d | 1 | 11 | +---+--------+------------+ 

Then select a random number n between 0 and 11 inclusive and return the string a if 0<=n<3 , b if 3<=n<6 , etc.

Here are some links to generating amounts:

http://dev.mysql.com/tech-resources/articles/rolling_sums_in_mysql.html

http://dev.mysql.com/tech-resources/articles/rolling_sums_in_mysql_followup.html

+4
source

I do not know that this can be done very easily with SQL. Using T-SQL or similar, you can write a loop to duplicate rows, or you can use SQL to generate statements to duplicate rows.

I do not know your probabilistic model, but you can use this approach to achieve the latter. Given these table definitions:

 RowSource --------- RowID UserRowProbability ------------------ UserId RowId FrequencyMultiplier 

You can write a query like this (specific to SQL Server):

 SELECT TOP 100 rs.RowId, urp.FrequencyMultiplier FROM RowSource rs LEFT JOIN UserRowProbability urp ON rs.RowId = urp.RowId ORDER BY ISNULL(urp.FrequencyMultiplier, 1) DESC, NEWID() 

This will take care of choosing a random set of rows, as well as how much should be repeated. Then, in your application logic, you can duplicate rows and shuffle the results.

0
source

Start with 3 users of tables, data, and user data. User data contains which rows should be preferred for each user.

Then create one view based on the rows of data that the user prefers.

Create a second view in which there is no preferred data.

Create a third view, which is the union of the first 2. The union must select more rows from the preferred data.

Then, finally, select random rows from the third view.

0
source

All Articles