Step by step:
First get the number of rows per (PID, CID) . It's simple:
SELECT PID, CID, COUNT(*) AS cnt FROM checks GROUP BY PID, CID
And you get this result set for your example:
PID CID cnt --- --- --- p1 c1 2 p1 c2 3 p2 c1 6 p2 c2 5
Now enter COUNT(*) OVER (PARTITION BY PID) to return the number of categories per person:
SELECT PID, CID, COUNT(*) AS cnt, COUNT(*) OVER (PARTITION BY PID) AS cat_cnt FROM checks GROUP BY PID, CID
The OVER clause turns the โnormalโ aggregate function COUNT() into a window aggregation function. This makes COUNT(*) work with the grouped rowset, not the original one. So, COUNT(*) OVER ... in this case, it counts the lines by PID , which for us matters the number of categories per person. And this is an updated result set:
PID CID cnt cnt_cat --- --- --- ------- p1 c1 2 2 p1 c2 3 2 p2 c1 6 2 p2 c2 5 2
One more thing to do: rank cnt values โโby PID . This can be tricky, as there may be connections in the upper counts. If you always want one line per PID and are completely indifferent to which CID, cnt will be in case of communication, you can change the request as follows:
SELECT PID, CID, COUNT(*) AS cnt, COUNT(*) OVER (PARTITION BY PID) AS cat_cnt, ROW_NUMBER() OVER (PARTITION BY PID ORDER BY COUNT(*) DESC) AS rn FROM checks GROUP BY PID, CID
And it will look like this:
PID CID cnt cnt_cat rn --- --- --- ------- -- p1 c1 2 2 2 p1 c2 3 2 1 p2 c1 6 2 1 p2 c2 5 2 2
At this point, the results contain all the data needed to get the final output, you just need to filter on cnt_cat and rn . However, you cannot do this directly. Instead, use the last query as a derived table, whether with a table expression WITH or a "normal" subtask. The following is an example of using WITH :
WITH grouped AS ( SELECT PID, CID, COUNT(*) AS cnt, COUNT(*) OVER (PARTITION BY PID) AS cat_cnt, ROW_NUMBER() OVER (PARTITION BY PID ORDER BY COUNT(*) DESC) AS rn FROM checks GROUP BY PID, CID ) SELECT PID, CID, cnt FROM grouped WHERE cat_cnt > 1 AND rn = 1 ;
Here's the SQL Fiddle demo (using Oracle): http://sqlfiddle.com/#!4/cd62d/8
To expand a bit more in the ranking part, if you still want to return one CID, cnt per PID , but would prefer to have more control over which line should be defined as the โwinnerโ, you will need to add a tie-break to the ORDER BY ranking function. For example, you can change the original expression,
ROW_NUMBER() OVER (PARTITION BY PID ORDER BY COUNT(*) DESC) AS rn
with this:
ROW_NUMBER() OVER (PARTITION BY PID ORDER BY COUNT(*) DESC , CID ) AS rn
those. a tie-break CID , which means two or more CID with an upper counter, one that is sorted before others win.
However, you may want to return all the top bills for the PID . In this case, use RANK() or DENSE_RANK() instead of ROW_NUMBER() (and without a time switch), for example:
RANK() OVER (PARTITION BY PID ORDER BY COUNT(*) DESC) AS rn