SQL: returning the most common value for each person

EDIT: I use MySQL, I found another post with the same question, but this is in Postgres; I need MySQL.

Get the most common value for each value of another column in SQL

I ask this question after an extensive search of this site and others, but have not found a result that works the way I intend it.

I have a people table (recordid, personid, transactionid) and a transaction table (transactionid, rating). I need one SQL statement that can return the most common rating that everyone has.

I currently have this SQL statement that returns the most common rating for a given person id. It works and may possibly help others.

SELECT transactionTable.rating as MostCommonRating FROM personTable, transactionTable WHERE personTable.transactionid = transactionTable.transactionid AND personTable.personid = 1 GROUP BY transactionTable.rating ORDER BY COUNT(transactionTable.rating) desc LIMIT 1 

However, I need an instruction that does what the above statement does for each person in a personalized table.

My attempt below; however, this time is from my MySQL server.

 SELECT personid AS pid, (SELECT transactionTable.rating as MostCommonRating FROM personTable, transactionTable WHERE personTable.transactionid = transactionTable.transactionid AND personTable.personid = pid GROUP BY transactionTable.rating ORDER BY COUNT(transactionTable.rating) desc LIMIT 1) FROM persontable GROUP BY personid 

Any help you can give me would be a must. Thanks.

PERSONTABLE

 RecordID, PersonID, TransactionID 1, Adam, 1 2, Adam, 2 3, Adam, 3 4, Ben, 1 5, Ben, 3 6, Ben, 4 7, Caitlin, 4 8, Caitlin, 5 9, Caitlin, 1 

TRANSACTIONTABLE

 TransactionID, Rating 1 Good 2 Bad 3 Good 4 Average 5 Average 

The output of the SQL query I'm looking for will be as follows:

OUTPUT

 PersonID, MostCommonRating Adam Good Ben Good Caitlin Average 
+8
source share
3 answers

Preliminary comment

Please learn to use explicit JOIN notation rather than the old (until 1992) implicit join record.

Old style:

 SELECT transactionTable.rating as MostCommonRating FROM personTable, transactionTable WHERE personTable.transactionid = transactionTable.transactionid AND personTable.personid = 1 GROUP BY transactionTable.rating ORDER BY COUNT(transactionTable.rating) desc LIMIT 1 

Preferred Style:

 SELECT transactionTable.rating AS MostCommonRating FROM personTable JOIN transactionTable ON personTable.transactionid = transactionTable.transactionid WHERE personTable.personid = 1 GROUP BY transactionTable.rating ORDER BY COUNT(transactionTable.rating) desc LIMIT 1 

Each JOIN requires an ON clause.

Also, the personID values ​​in the data are strings, not numbers, so you need to write

  WHERE personTable.personid = "Ben" 

for example, to make the query work for the tables shown.


Main answer

You are trying to find the totality of the totality: in this case, the maximum score. Thus, any general solution will include both MAX and COUNT. You cannot apply MAX directly to COUNT, but you can apply MAX to a column from a subquery where the column is COUNT.

Create a query using Test-Driven Query Design - TDQD.

Choose a person and transaction rating

 SELECT p.PersonID, t.Rating, t.TransactionID FROM PersonTable AS p JOIN TransactionTable AS t ON p.TransactionID = t.TransactionID 

Choose a person, rating and number of occurrences of the rating

 SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount FROM PersonTable AS p JOIN TransactionTable AS t ON p.TransactionID = t.TransactionID GROUP BY p.PersonID, t.Rating 

This result will become a subquery.

Find the maximum number of times a person gets a rating

 SELECT s.PersonID, MAX(s.RatingCount) FROM (SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount FROM PersonTable AS p JOIN TransactionTable AS t ON p.TransactionID = t.TransactionID GROUP BY p.PersonID, t.Rating ) AS s GROUP BY s.PersonID 

Now we know what the maximum score is for each person.

Desired Result

To get the result, we need to select the rows from the subquery with the maximum counter. Please note: if someone has 2 Good and 2 Bad ratings (and 2 is the maximum number of ratings of the same type for this person), then two entries will be shown for this person.

 SELECT s.PersonID, s.Rating FROM (SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount FROM PersonTable AS p JOIN TransactionTable AS t ON p.TransactionID = t.TransactionID GROUP BY p.PersonID, t.Rating ) AS s JOIN (SELECT s.PersonID, MAX(s.RatingCount) AS MaxRatingCount FROM (SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount FROM PersonTable AS p JOIN TransactionTable AS t ON p.TransactionID = t.TransactionID GROUP BY p.PersonID, t.Rating ) AS s GROUP BY s.PersonID ) AS m ON s.PersonID = m.PersonID AND s.RatingCount = m.MaxRatingCount 

If you want the actual number of ratings to be easily selected too.

This is a pretty tricky part of SQL. I would really like to try to write this from scratch. Indeed, I probably would not have bothered; I developed it step by step, more or less, as shown. But since we debugged subqueries before using them in large expressions, we can be sure of the answer.

WITH clause

Note that standard SQL provides the WITH clause, which is the prefix of the SELECT statement, the name subquery. (It can also be used for recursive queries, but this is not needed here.)

 WITH RatingList AS (SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount FROM PersonTable AS p JOIN TransactionTable AS t ON p.TransactionID = t.TransactionID GROUP BY p.PersonID, t.Rating ) SELECT s.PersonID, s.Rating FROM RatingList AS s JOIN (SELECT s.PersonID, MAX(s.RatingCount) AS MaxRatingCount FROM RatingList AS s GROUP BY s.PersonID ) AS m ON s.PersonID = m.PersonID AND s.RatingCount = m.MaxRatingCount 

It is easier to write. Unfortunately, MySQL does not yet support the WITH clause.


The SQL above has now been tested against IBM Informix Dynamic Server 11.70.FC2 running on Mac OS X 10.7.4. This test revealed a problem posed in a preliminary comment. SQL for the main answer worked correctly, without the need for change.

+24
source

For anyone using Microsoft SQL Server: you have the option to create a custom aggregate function to get the most common value. Example 2 of this blog post by Ahmed Tarek Hassan describes how to do this:

http://developmentsimplyput.blogspot.nl/2013/03/creating-sql-custom-user-defined.html

0
source

Here is a somewhat hacky abuse of the fact that the max aggregation function in MySQL performs lexical sorting by varchars (as well as the expected numeric sorting by integers / floating point numbers):

 SELECT PersonID, substring(max(concat(lpad(c, 20, '0'), Rating)), 21) AS MostFrequentRating FROM ( SELECT PersonID, Rating, count(*) c FROM PERSONTABLE INNER JOIN TRANSACTIONTABLE USING(TransactionID) GROUP BY PersonID, Rating ) AS grouped_ratings GROUP BY PersonID; 

What gives the desired:

 +----------+--------------------+ | PersonID | MostFrequentRating | +----------+--------------------+ | Adam | Good | | Ben | Good | | Caitlin | Average | +----------+--------------------+ 

(note that if a person has several modes, he will choose the one with the highest alphabetical input, so - to a large extent randomly - β€œgood”, β€œbad” and β€œbad” on average)

You should be able to see what max working on by examining the following:

 SELECT PersonID, Rating, count(*) c, concat(lpad(count(*), 20, '0'), Rating) as LexicalMaxMe FROM PERSONTABLE INNER JOIN TRANSACTIONTABLE USING(TransactionID) GROUP BY PersonID, Rating; 

What are the findings:

 +----------+---------+---+-----------------------------+ | PersonID | Rating | c | LexicalMaxMe | +----------+---------+---+-----------------------------+ | Adam | Bad | 1 | 00000000000000000001Bad | | Adam | Good | 2 | 00000000000000000002Good | | Ben | Average | 1 | 00000000000000000001Average | | Ben | Good | 2 | 00000000000000000002Good | | Caitlin | Average | 2 | 00000000000000000002Average | | Caitlin | Good | 1 | 00000000000000000001Good | +----------+---------+---+-----------------------------+ 
0
source

All Articles