Count of median day

I have a script that calculates the average value for all the data in a table:

SELECT avg(t1.price) as median_val FROM ( SELECT @rownum: =@rownum +1 as `row_number`, d.price FROM mediana d, (SELECT @rownum:=0) r WHERE 1 ORDER BY d.price ) as t1, ( SELECT count(*) as total_rows FROM mediana d WHERE 1 ) as t2 AND t1.row_number>=total_rows/2 and t1.row_number<=total_rows/2+1; 

Now I need to get the median value not for all table values, but grouped by date. Is it possible? http://sqlfiddle.com/#!2/7cf27 - as the result I get 2013-03-06 - 1.5, 2013-03-05 - 3.5.

+6
source share
2 answers

I hope I did not lose myself and could not compromise things, but here is what I came up with:

 SELECT sq.created_at, avg(sq.price) as median_val FROM ( SELECT t1.row_number, t1.price, t1.created_at FROM( SELECT IF(@prev!=d.created_at, @rownum:=1, @rownum: =@rownum +1) as `row_number`, d.price, @prev:=d.created_at AS created_at FROM mediana d, (SELECT @rownum:=0, @prev:=NULL) r ORDER BY created_at, price ) as t1 INNER JOIN ( SELECT count(*) as total_rows, created_at FROM mediana d GROUP BY created_at ) as t2 ON t1.created_at = t2.created_at WHERE 1=1 AND t1.row_number>=t2.total_rows/2 and t1.row_number<=t2.total_rows/2+1 )sq group by sq.created_at 

What I did here is basically just to reset rownumber to 1 when the date changes (it is important for ordering by created_at) and includes the date so we can group it. In the query that computes the full rows, I also included created_at, so we can join the two subqueries.

+10
source

Here is another median example inspired by this post with SUBSTRING_INDEX and GROUP_CONCAT . I'm not sure about the performance on large tables regarding the method described by @fancyPants, which uses row numbers, but on small tables (~ 20K rows) it works very fast.

 SET SESSION group_concat_max_len = 1000000; SELECT created_at, ( CAST( SUBSTRING_INDEX( SUBSTRING_INDEX( GROUP_CONCAT( price ORDER BY price SEPARATOR ','), ',', FLOOR((COUNT(*)+1)/2) ), ',', -1) AS DECIMAL) + CAST( SUBSTRING_INDEX( SUBSTRING_INDEX( GROUP_CONCAT( price ORDER BY price SEPARATOR ','), ',', FLOOR((COUNT(*)+2)/2) ), ',', -1) AS DECIMAL) ) / 2.0 AS median_price FROM mediana GROUP BY created_at ; 

Here is the result for sqlfiddle asked in the question (the fiddle seems to be broken, but I run it on the table shown in the script in MySQL itself):

 +------------+--------------+ | created_at | median_price | +------------+--------------+ | 2012-03-05 | 3.5000 | | 2012-03-06 | 1.5000 | +------------+--------------+ 

GROUP_CONCAT essentially creates a string representation of the price array for the created_at date. The two SUBSTRING_INDEX then look for the average value (s), i.e. the median. It is necessary to have two calls to GROUP_CONCAT and average them to handle the case when an even number of price elements for one created_at date.

UPDATE:

It should be noted that the GROUP_CONCAT function has a default length of 1024 bytes, see here . This can lead to truncation of very long results, which will lead to miscalculation. You can set a larger default value with the command SET SESSION group_concat_max_len = N; , where N is another, more important if you are concerned about the big results. I added this setting to the code snippet above. I chose 1,000,000, but you can also use a different value.

You can also check your results with COUNT(*) and OFFSET with one of your GROUP BY values. For instance,

  • First get the number of rows for a specific GROUP BY value,

SELECT COUNT(*) FROM mediana WHERE created_at = '2012-03-06';

  1. Let X be the number of lines you get from step 1. Divide X by 2 to get half its value, Y

  2. Use the Y value as an offset to find the median.

    a. If Y was an integer, then do both

    SELECT price FROM mediana WHERE created_at = '2012-03-06' ORDER BY price LIMIT 1 OFFSET (Y-1);

    and

    SELECT price FROM mediana WHERE created_at = '2012-03-06' ORDER BY price LIMIT 1 OFFSET Y;

    and average two results to get the median value.

    b. If Y was decimal, round Y to the nearest integer (name it W ) and use this as one offset,

    SELECT price FROM mediana WHERE created_at = '2012-03-06' ORDER BY price LIMIT 1 OFFSET W;

    and that will be your median value.

+1
source

All Articles