Mysql median calculation

I have problems calculating the median of the list of values, not the average.

I found this article An easy way to calculate the median with MySQL

He has a link to the next request, which I do not understand correctly.

SELECT x.val from data x, data y GROUP BY x.val HAVING SUM(SIGN(1-SIGN(y.val-x.val))) = (COUNT(*)+1)/2 

If I have a time column and want to calculate the average, what are the columns x and y referring to?

+7
mysql statistics median
source share
7 answers

val is your time column, x and y are two references to the data table (you can write data AS x, data AS y ).

EDIT: To avoid calculating your sums twice, you can save intermediate results.

 CREATE TEMPORARY TABLE average_user_total_time (SELECT SUM(time) AS time_taken FROM scores WHERE created_at >= '2010-10-10' and created_at <= '2010-11-11' GROUP BY user_id); 

Then you can calculate the median over these values, which are in the named table.

EDIT: The temporary table will not work here . You can try using a regular table with the table type "MEMORY". Or just your subquery that calculates the values ​​for the median in your query twice. In addition, I do not see another solution. This does not mean that there is no better way, maybe someone else will come up with an idea.

+2
source share

I suggest a faster way.

Get the number of rows:

SELECT CEIL(COUNT(*)/2) FROM data;

Then take the average in the sorted subquery:

SELECT max(val) FROM (SELECT val FROM data ORDER BY val limit @middlevalue) x;

I tested this with a 5x10e6 random number dataset and it will find the median value in less than 10 seconds.

This will find an arbitrary percentile by replacing COUNT(*)/2 with COUNT(*)*n , where n is the percentile (.5 for the median, .75 for the 75th percentile, etc.).

+10
source share

First, try to understand what the median is: this is the average value in a sorted list of values.

Once you understand this, the approach consists of two steps:

  • sort values ​​in any order
  • select the average value (if not an odd number of values, select the average of two average values)

Example:

 Median of 0 1 3 7 9 10: 5 (because (7+3)/2=5) Median of 0 1 3 7 9 10 11: 7 (because 7 is the middle value) 

So, to sort dates, you need a numeric value; you can get your timestamp (in seconds elapsed since the epoch), and use the median definition.

+1
source share

Finding median in mysql using group_concat

Query:

 SELECT IF(count%2=1, SUBSTRING_INDEX(substring_index(data_str,",",pos),",",-1), (SUBSTRING_INDEX(substring_index(data_str,",",pos),",",-1) + SUBSTRING_INDEX(substring_index(data_str,",",pos+1),",",-1))/2) as median FROM (SELECT group_concat(val order by val) data_str, CEILING(count(*)/2) pos, count(*) as count from data)temp; 

Explanation:

Sorting is done using order inside the group_concat function

Position (pos) and Total number of elements (quantity) identified. CEILING for position determination helps us use the substring_index function in the next steps.

Based on the count, an even or odd number of values ​​is determined.

  • Odd values: directly select the element belonging to pos using substring_index.
  • Even values: find the element belonging to pos and pos + 1, then add them and divide by 2 to get the median.

Finally, the median is calculated.

+1
source share

If you have a table R with a column named A , and you want the median A , you can do the following:

 SELECT A FROM R R1 WHERE ( SELECT COUNT(A) FROM R R2 WHERE R2.A < R1.A ) = ( SELECT COUNT(A) FROM R R3 WHERE R3.A > R1.A ) 

Note This will only work if there are no duplicate values ​​in A. Null values ​​are also not allowed.

+1
source share

The easiest ways my friend and I have learned ... ENJOY !!

 SELECT count(*) INTO @c from station; select ROUND((@c+1)/2) into @final; SELECT round(lat_n,4) from station a where @final-1=(select count(lat_n) from station b where b.lat_n > a.lat_n); 
+1
source share

Here is a solution that is easy to understand. Just replace Your_Column and Your_Table to suit your requirements.

 SET @r = 0; SELECT AVG(Your_Column) FROM (SELECT (@r := @r + 1) AS r, Your_Column FROM Your_Table ORDER BY Your_Column) Temp WHERE r = (SELECT CEIL(COUNT(*) / 2) FROM Your_Table) OR r = (SELECT FLOOR((COUNT(*) / 2) + 1) FROM Your_Table) 

Originally taken from this thread .

0
source share

All Articles