Using MEDIAN along the MAX, MIN, and AVG functions in MySQL

I have the following MySQL query that works fine:

select count(*) as `# of Data points`, name, max((QNTY_Sell/QNTYDelivered)*1000) as `MAX Thousand Price`, min((QNTY_Sell/QNTYDelivered)*1000) as `MIN Thousand Price`, avg((QNTY_Sell/QNTYDelivered)*1000) as `MEAN Thousand Price` from table_name where year(date) >= 2012 and name like "%the_name%" and QNTYDelivered > 0 and QNTY_Sell > 0 group by name order by name; 

Now I also want to add a result column that gives me MEDIAN data for each row. In SELECT it will look like this in a perfect world:

 median((QNTY_Sell/QNTYDelivered)*1000) as `MEDIAN Thousand Price` 

A Google search for the median function of MySQL led me to this answer, which looks fine if you are interested in the median of the dataset for the entire table: An easy way to calculate the median with MySQL

The difference here is that I group the data in my table with the name column and want to get the median for each row of data grouped by this column.

Does anyone know a great way to do this?

Thanks!

+7
source share
2 answers

You can calculate the median with GROUP BY in MySQL, even if it does not have a median.

Consider the table:

 Acrington 200.00 Acrington 200.00 Acrington 300.00 Acrington 400.00 Bulingdon 200.00 Bulingdon 300.00 Bulingdon 400.00 Bulingdon 500.00 Cardington 100.00 Cardington 149.00 Cardington 151.00 Cardington 300.00 Cardington 300.00 

For each row, you can count the number of similar elements that are smaller. You can also calculate how many values ​​are less than or equal to:

 name v < <= Acrington 200.00 0 2 Acrington 200.00 0 2 Acrington 300.00 2 3 Acrington 400.00 3 4 Bulingdon 200.00 0 1 Bulingdon 300.00 1 2 Bulingdon 400.00 2 3 Bulingdon 500.00 3 4 Cardington 100.00 0 1 Cardington 149.00 1 2 Cardington 151.00 2 3 Cardington 300.00 3 5 Cardington 300.00 3 5 

With request

 SELECT name,v, (SELECT COUNT(1) FROM sale WHERE v<ov AND name=o.name) as ls , (SELECT COUNT(1) FROM sale WHERE v<=ov AND name=o.name) as lse FROM sale o 

An average value will occur if the quantity is less than or equal to half the number of elements

  • Acrington has 4 elements. Half of this is 2, which is in the range 0..2 (corresponds to 200.00), as well as in the range 2..3 (corresponds to 300.00)

  • Bullingdon also has 4 elements. 2 is in the range 1..2 (value 300.00) and 2..3 (value 400.00)

  • Cardington has 5 elements. A value of 2.5 is between 2 and 3, which corresponds to Cardington 151.

The average value is the average value of min and max returned:

 SELECT cs.name,v FROM (SELECT name,v, (SELECT COUNT(1) FROM sale WHERE v<ov AND name=o.name) as ls , (SELECT COUNT(1) FROM sale WHERE v<=ov AND name=o.name) as lse FROM sale o) cs JOIN (SELECT name,COUNT(1)*.5 as cn FROM sale GROUP BY name) cc ON cs.name=cc.name WHERE cn between ls and lse 

What gives:

 Acrington 200.00 Acrington 200.00 Acrington 300.00 Bulingdon 300.00 Bulingdon 400.00 Cardington 151.00 

Finally, we can get the median:

 SELECT name,(MAX(v)+MIN(v))/2 FROM (SELECT cs.name,v FROM (SELECT name,v, (SELECT COUNT(1) FROM sale WHERE v<ov AND name=o.name) as ls , (SELECT COUNT(1) FROM sale WHERE v<=ov AND name=o.name) as lse FROM sale o) cs JOIN (SELECT name,COUNT(1)*.5 as cn FROM sale GROUP BY name) cc ON cs.name=cc.name WHERE cn between ls and lse ) AS medians GROUP BY name 

Provision

 Acrington 250.000000 Bulingdon 350.000000 Cardington 151.000000 
+3
source

The only way I found this is to manipulate the string:
a list of all values ​​is created with GROUP_CONCAT , then the median value is accepted indented with SUBSTRING_INDEX

 SELECT count(*) AS `# of Data points`, name, max((QNTY_Sell/QNTYDelivered)*1000) AS `MAX Thousand Price`, min((QNTY_Sell/QNTYDelivered)*1000) AS `MIN Thousand Price`, avg((QNTY_Sell/QNTYDelivered)*1000) AS `MEAN Thousand Price` , CASE (count(*) % 2) WHEN 1 THEN SUBSTRING_INDEX( SUBSTRING_INDEX( group_concat((QNTY_Sell/QNTYDelivered)*1000 ORDER BY (QNTY_Sell/QNTYDelivered)*1000 SEPARATOR ',') , ',', (count(*) + 1) / 2) , ',', -1) ELSE (SUBSTRING_INDEX( SUBSTRING_INDEX( group_concat((QNTY_Sell/QNTYDelivered)*1000 ORDER BY (QNTY_Sell/QNTYDelivered)*1000 SEPARATOR ',') , ',', count(*) / 2) , ',', -1) + SUBSTRING_INDEX( SUBSTRING_INDEX( group_concat((QNTY_Sell/QNTYDelivered)*1000 ORDER BY (QNTY_Sell/QNTYDelivered)*1000 SEPARATOR ',') , ',', (count(*) + 1) / 2) , ',', -1)) / 2 END median FROM sales WHERE year(date) >= 2012 AND name LIKE "%art.%" AND QNTYDelivered > 0 AND QNTY_Sell > 0 GROUP BY name ORDER BY name; 

CASE is needed to check if we have one median value with an odd number of values ​​or two median values ​​with an even number of values, in the second case the median is the average of the two values ​​set.

SQLFiddle

+2
source

All Articles