Here is another median example inspired by this post with SUBSTRING_INDEX and GROUP_CONCAT . I'm not sure about the performance on large tables regarding the method described by @fancyPants, which uses row numbers, but on small tables (~ 20K rows) it works very fast.
SET SESSION group_concat_max_len = 1000000; SELECT created_at, ( CAST( SUBSTRING_INDEX( SUBSTRING_INDEX( GROUP_CONCAT( price ORDER BY price SEPARATOR ','), ',', FLOOR((COUNT(*)+1)/2) ), ',', -1) AS DECIMAL) + CAST( SUBSTRING_INDEX( SUBSTRING_INDEX( GROUP_CONCAT( price ORDER BY price SEPARATOR ','), ',', FLOOR((COUNT(*)+2)/2) ), ',', -1) AS DECIMAL) ) / 2.0 AS median_price FROM mediana GROUP BY created_at ;
Here is the result for sqlfiddle asked in the question (the fiddle seems to be broken, but I run it on the table shown in the script in MySQL itself):
+------------+--------------+ | created_at | median_price | +------------+--------------+ | 2012-03-05 | 3.5000 | | 2012-03-06 | 1.5000 | +------------+--------------+
GROUP_CONCAT essentially creates a string representation of the price array for the created_at date. The two SUBSTRING_INDEX then look for the average value (s), i.e. the median. It is necessary to have two calls to GROUP_CONCAT and average them to handle the case when an even number of price elements for one created_at date.
UPDATE:
It should be noted that the GROUP_CONCAT function has a default length of 1024 bytes, see here . This can lead to truncation of very long results, which will lead to miscalculation. You can set a larger default value with the command SET SESSION group_concat_max_len = N; , where N is another, more important if you are concerned about the big results. I added this setting to the code snippet above. I chose 1,000,000, but you can also use a different value.
You can also check your results with COUNT(*) and OFFSET with one of your GROUP BY values. For instance,
- First get the number of rows for a specific
GROUP BY value,
SELECT COUNT(*) FROM mediana WHERE created_at = '2012-03-06';
Let X be the number of lines you get from step 1. Divide X by 2 to get half its value, Y
Use the Y value as an offset to find the median.
a. If Y was an integer, then do both
SELECT price FROM mediana WHERE created_at = '2012-03-06' ORDER BY price LIMIT 1 OFFSET (Y-1);
and
SELECT price FROM mediana WHERE created_at = '2012-03-06' ORDER BY price LIMIT 1 OFFSET Y;
and average two results to get the median value.
b. If Y was decimal, round Y to the nearest integer (name it W ) and use this as one offset,
SELECT price FROM mediana WHERE created_at = '2012-03-06' ORDER BY price LIMIT 1 OFFSET W;
and that will be your median value.