Getting data for a histogram chart

Is there a way to specify buffer sizes in MySQL? Right now, I'm trying to execute the following SQL query:

select total, count(total) from faults GROUP BY total; 

The data that is generated is good enough, but there are too many rows. I need a way to group data into predefined cells. I can do this in a scripting language, but is there a way to do this directly in SQL?

Example:

 +-------+--------------+ | total | count(total) | +-------+--------------+ | 30 | 1 | | 31 | 2 | | 33 | 1 | | 34 | 3 | | 35 | 2 | | 36 | 6 | | 37 | 3 | | 38 | 2 | | 41 | 1 | | 42 | 5 | | 43 | 1 | | 44 | 7 | | 45 | 4 | | 46 | 3 | | 47 | 2 | | 49 | 3 | | 50 | 2 | | 51 | 3 | | 52 | 4 | | 53 | 2 | | 54 | 1 | | 55 | 3 | | 56 | 4 | | 57 | 4 | | 58 | 2 | | 59 | 2 | | 60 | 4 | | 61 | 1 | | 63 | 2 | | 64 | 5 | | 65 | 2 | | 66 | 3 | | 67 | 5 | | 68 | 5 | ------------------------ 

What I'm looking for:

 +------------+---------------+ | total | count(total) | +------------+---------------+ | 30 - 40 | 23 | | 40 - 50 | 15 | | 50 - 60 | 51 | | 60 - 70 | 45 | ------------------------------ 

I suppose this cannot be achieved in a direct way, but a link to any associated stored procedure would also be fine.

+67
mysql binning histogram
Nov 19 '09 at 17:02
source share
10 answers

This post is about a super-fast and dirty way of creating a histogram in MySQL for numeric values.

There are several other ways to create histograms that are better and more flexible using CASE statements and other types of complex logic. This method defeats me again and again, as it is just so easy to change for each use case, and therefore short and short. Here's how you do it:

 SELECT ROUND(numeric_value, -2) AS bucket, COUNT(*) AS COUNT, RPAD('', LN(COUNT(*)), '*') AS bar FROM my_table GROUP BY bucket; 

Just change the value of numeric_value to any column, change the rounding, and what it is. I made the bars to be on a logarithmic scale, so that they do not grow too much when you have large values.

The numeric_value should be offset in the ROUNDING operation based on increasing rounding to ensure that the first bucket contains as many items as the next buckets.

eg. with ROUND (numeric_value, -1), a numeric_value in the range [0.4] (5 elements) will be placed in the first bucket, while [5.14] (10 elements) in the second, [15.24] in the third, if only numeric_value is offset accordingly through ROUND (the numeric value is 5, -1).

This is an example of such a query for some random data that looks pretty nice. Good enough to quickly evaluate data.

 +--------+----------+-----------------+ | bucket | count | bar | +--------+----------+-----------------+ | -500 | 1 | | | -400 | 2 | * | | -300 | 2 | * | | -200 | 9 | ** | | -100 | 52 | **** | | 0 | 5310766 | *************** | | 100 | 20779 | ********** | | 200 | 1865 | ******** | | 300 | 527 | ****** | | 400 | 170 | ***** | | 500 | 79 | **** | | 600 | 63 | **** | | 700 | 35 | **** | | 800 | 14 | *** | | 900 | 15 | *** | | 1000 | 6 | ** | | 1100 | 7 | ** | | 1200 | 8 | ** | | 1300 | 5 | ** | | 1400 | 2 | * | | 1500 | 4 | * | +--------+----------+-----------------+ 

Some notes: Ranges that do not have a match will not be displayed in the account - you will not have a zero in the count column. In addition, I use the ROUND function here. You can just as easily replace it with TRUNCATE if you feel it makes more sense to you.

I found it here http://blog.shlomoid.com/2011/08/how-to-quickly-create-histogram-in.html

+124
Apr 28 '12 at 11:40
source share

Mike Del Gaudio's answer is how I do it, but with a little change:

 select floor(mycol/10)*10 as bin_floor, count(*) from mytable group by 1 order by 1 

Advantage? You can make the bins big or small as you want. Bottle size 100? floor(mycol/100)*100 . 5 bottles? floor(mycol/5)*5 .

Bernardo.

+20
Sep 14 '11 at 3:16
source share
 SELECT b.*,count(*) as total FROM bins b left outer join table1 a on a.value between b.min_value and b.max_value group by b.min_value 

The table buffers contain the min_value and max_value columns that define the cells. note that the "join ... on x BETWEEN y and z" statement is included.

table1 is the name of the data table

+16
Nov 19 '09 at 17:38
source share

Autry Raviv answers very closely, but incorrectly. count(*) will be 1 , even if the histogram has zero results. The request must be modified to use the conditional sum :

 SELECT b.*, SUM(a.value IS NOT NULL) AS total FROM bins b LEFT JOIN a ON a.value BETWEEN b.min_value AND b.max_value GROUP BY b.min_value; 
+9
Jul 20 '10 at 19:07
source share
 select "30-34" as TotalRange,count(total) as Count from table_name where total between 30 and 34 union ( select "35-39" as TotalRange,count(total) as Count from table_name where total between 35 and 39) union ( select "40-44" as TotalRange,count(total) as Count from table_name where total between 40 and 44) union ( select "45-49" as TotalRange,count(total) as Count from table_name where total between 45 and 49) etc .... 

While there are not many intervals, this is a pretty good solution.

+8
Sep 10 '12 at 19:22
source share

I made a procedure that can be used to automatically create a temporary table for boxes according to the specified number or size for later use with Ofri Raviv's solution.

 CREATE PROCEDURE makebins(numbins INT, binsize FLOAT) # binsize may be NULL for auto-size BEGIN SELECT FLOOR(MIN(colval)) INTO @binmin FROM yourtable; SELECT CEIL(MAX(colval)) INTO @binmax FROM yourtable; IF binsize IS NULL THEN SET binsize = CEIL((@binmax-@binmin)/numbins); # CEIL here may prevent the potential creation a very small extra bin due to rounding errors, but no good where floats are needed. END IF; SET @currlim = @binmin; WHILE @currlim + binsize < @binmax DO INSERT INTO bins VALUES (@currlim, @currlim+binsize); SET @currlim = @currlim + binsize; END WHILE; INSERT INTO bins VALUES (@currlim, @maxbin); END; DROP TABLE IF EXISTS bins; # be careful if you have a bins table of your own. CREATE TEMPORARY TABLE bins ( minval INT, maxval INT, # or FLOAT, if needed KEY (minval), KEY (maxval) );# keys could perhaps help if using a lot of bins; normally negligible CALL makebins(20, NULL); # Using 20 bins of automatic size here. SELECT bins.*, count(*) AS total FROM bins LEFT JOIN yourtable ON yourtable.value BETWEEN bins.minval AND bins.maxval GROUP BY bins.minval 

This will only count histograms for filled bins. David West should be right in his corrections, but for some reason, as a result, dust-free bunkers do not appear for me (despite using the LEFT CONNECTION - I do not understand why).

+3
Aug 10 '10 at 11:12
source share

That should work. Not so elegant, but still:

 select count(mycol - (mycol mod 10)) as freq, mycol - (mycol mod 10) as label from mytable group by mycol - (mycol mod 10) order by mycol - (mycol mod 10) ASC 

via Mike Del Gaudio

+3
Feb 17 2018-11-17T00:
source share
 select case when total >= 30 and total <= 40 THEN "30-40" else when total >= 40 and total <= 50 then "40-50" else "50-60" END as Total , count(total) group by Total 
+1
Jul 25 '15 at 17:18
source share

In addition to the excellent answer, https://stackoverflow.com/a/312628/2322/ ... you can use the phpmyadmin chart tool for a nice result:

enter image description here

enter image description here

0
Apr 14 '15 at 10:27
source share

Equalizing the width equally in a given number of boxes:

 WITH bins AS( SELECT min(col) AS min_value , ((max(col)-min(col)) / 10.0) + 0.0000001 AS bin_width FROM cars ) SELECT tab.*, floor((col-bins.min_value) / bins.bin_width ) AS bin FROM tab, bins; 

Note that 0.0000001 must make sure that records with a value equal to max (col) do not make it their own bunker separately. In addition, there is an additivity constant to ensure that the query does not work when dividing by zero, when all the values ​​in the column are identical.

Also note that the number of bins (10 in the example) should be written with a decimal mark to avoid integer division (unadjusted bin_width can be decimal).

0
Dec 31 '15 at 15:17
source share



All Articles