Oracle ListaGG, Top 3 most common values ​​specified in a single column, grouped by ID

I have a problem with the SQL query, this can be done in "simple" SQL, but since I'm sure I need to use some concatenation group (I can not use MySQL), the second option is the ORACLE dialect, as it will be a database Oracle Let them say that we have the following entities:

Table: Visits to Veterinarians

Visit_Id, Animal_id, Veterinarian_id, Sickness_code 

Say there are 100 visits (100 visit_id), and each animal_id visits about 20 times.

I need to create a SELECT grouped Animal_id with 3 columns

  • animal_id
  • the second shows the aggregated number of flu visits for a given animal (say, flu, sickness_code = 5)
  • The third column shows the three main disease codes for each animal (the most common codes for this particular animal_id)

How to do it? The first and second columns are simple, but third? I know that I need to use LISTAGG from Oracle, OVER PARTITION BY, COUNT and RANK, I tried to link this together, but it did not work out as I expected :( What should this query look like?

+2
sql oracle rank listagg
Jun 30 '16 at 15:48
source share
2 answers

I think the most natural way uses two levels of aggregation, as well as some window functions here and there:

 select vas.animal, sum(case when sickness_code = 5 then cnt else 0 end) as numflu, listagg(case when seqnum <= 3 then sickness_code end, ',') within group (order by seqnum) as top3sicknesses from (select animal, sickness_code, count(*) as cnt, row_number() over (partition by animal order by count(*) desc) as seqnum from visits group by animal, sickness_code ) vas group by vas.animal; 

This exploits the fact that listagg() ignores NULL values.

+1
Jun 30 '16 at 16:01
source share

Here are sample data

 create table VET as select rownum+1 Visit_Id, mod(rownum+1,5) Animal_id, cast(NULL as number) Veterinarian_id, trunc(10*dbms_random.value)+1 Sickness_code from dual connect by level <=100; 

Inquiry

basically subqueries do the following:

cumulative and influenza counts (in all animal reports)

calculate RANK (if you only need 3 records, use ROW_NUMBER - see discussion below)

Top of the filter page 3 RANKs

Result LISTAGGregate

 with agg as ( select Animal_id, Sickness_code, count(*) cnt, sum(case when SICKNESS_CODE = 5 then 1 else 0 end) over (partition by animal_id) as cnt_flu from vet group by Animal_id, Sickness_code ), agg2 as ( select ANIMAL_ID, SICKNESS_CODE, CNT, cnt_flu, rank() OVER (PARTITION BY ANIMAL_ID ORDER BY cnt DESC) rnk from agg ), agg3 as ( select ANIMAL_ID, SICKNESS_CODE, CNT, CNT_FLU, RNK from agg2 where rnk <= 3 ) select ANIMAL_ID, max(CNT_FLU) CNT_FLU, LISTAGG(SICKNESS_CODE||'('||CNT||')', ', ') WITHIN GROUP (ORDER BY rnk) as cnt_lts from agg3 group by ANIMAL_ID order by 1; 

gives

  ANIMAL_ID CNT_FLU CNT_LTS ---------- ---------- --------------------------------------------- 0 1 6(5), 1(4), 9(3) 1 1 1(5), 3(4), 2(3), 8(3) 2 0 1(5), 10(3), 4(3), 6(3), 7(3) 3 1 5(4), 2(3), 4(3), 7(3) 4 1 2(5), 10(4), 1(2), 3(2), 5(2), 7(2), 8(2) 

I intentionally show Sickness_code (hit count) to demonstrate that top 3 can have links that you need to handle. Check the RANK function. Using ROW_NUMBER in this case is not deterministic.

0
Jun 30 '16 at 16:14
source share



All Articles