Choose at least one from each category?

SQLFiddle Link

I have a SQLite database with a bunch of tests / exam questions. Each question belongs to one category. .

My table looks like this:
so_questions table

purpose
What I'm trying to do is select 5 random questions, but the result should contain at least one from each category. The goal is to select a random set of questions with questions from each category.

For example, the output may contain identifiers of questions 1, 2, 5, 7, 8 or 2, 3, 6, 7, 8 or 8, 6, 3, 1, 7 .

ORDER BY category_id, RANDOM ()
I can get a random list of questions from SQLite by running SQL below, but how would I make sure the result contains a question from each of my categories?

SELECT ORDER BY category_id, random

Basically, I'm looking for something like this , a version of SQLite.

I would like to get only 5 results, but one (or more) from each category, with all categories presented in the result set.

Bounty
Added generosity because I'm curious if this can only be done in SQLite. I can do it in SQLite + Java, but is there any way to do this only in SQLite? :)

SQLFiddle Link

+7
source share
3 answers

The key to the answer is that as a result, two types of questions arise: for each category - one question that must be limited in order to proceed from this category; and some remaining questions.

First, limited questions: we just select one entry from each category:

 SELECT id, category_id, question_text, 1 AS constrained, max(random()) AS r FROM so_questions GROUP BY category_id 

(This query is based on the function introduced in SQLite 3.7.11 (in jelly Bean or later): in the SELECT a, max(b) query SELECT a, max(b) value of a guaranteed to be obtained from the record with the maximum value of b .)

We should also get unlimited questions (filtering of duplicates that are already in a limited set will occur in the next step):

 SELECT id, category_id, question_text, 0 AS constrained, random() AS r FROM so_questions 

When we combine these two queries with UNION and then group by id , we have all the duplicates together. Choosing max(constrained) then ensures that for groups that have duplicates, only a limited question remains (while all other questions have only one entry for each group).

Finally, the ORDER BY guarantees that limited questions will be asked first, and then some random other questions:

 SELECT *, max(constrained) FROM (SELECT id, category_id, question_text, 1 AS constrained, max(random()) AS r FROM so_questions GROUP BY category_id UNION ALL SELECT id, category_id, question_text, 0 AS constrained, random() AS r FROM so_questions) GROUP BY id ORDER BY constrained DESC, r LIMIT 5 

For earlier versions of SQLite / Android, I did not find a solution without using a temporary table (because the subquery for a limited question should be used several times, but does not remain constant due to random() ):

 BEGIN TRANSACTION; CREATE TEMPORARY TABLE constrained AS SELECT (SELECT id FROM so_questions WHERE category_id = cats.category_id ORDER BY random() LIMIT 1) AS id FROM (SELECT DISTINCT category_id FROM so_questions) AS cats; SELECT ids.id, category_id, question_text FROM (SELECT id FROM (SELECT id, 1 AS c FROM constrained UNION ALL SELECT id, 0 AS c FROM so_questions WHERE id NOT IN (SELECT id FROM constrained)) ORDER BY c DESC, random() LIMIT 5) AS ids JOIN so_questions ON ids.id = so_questions.id; DROP TABLE constrained; COMMIT TRANSACTION; 
+6
source

Basically what you are looking for is select the top N maximum values . I spend 3-4 hours in the morning to find him. (I still have no success in this; you may have to wait a few more hours).

For a workaround, you can use the group by , as shown below,

String strQuery = "SELECT * FROM so_questions group by category_id;";

The conclusion is as follows:

enter image description here

will return with your exact requirements.

+4
source

Since it is sqlite (thus local): how slowly you just had to query until you have 5 answers and four different categories, discarding duplicate category lines for each iteration.

I think that if each category is equally represented, it is unlikely that you will need more than three iterations, which should be lower than a second.

This is not algorithmically nice, but for me, using random () in an SQL statement is not algorithmically nice anyway.

+2
source

All Articles