MySQL: how groupby works with columns without aggregate functions?

Question

MySQL: how groupby works with columns without aggregate functions?

I am a bit confused about how the group by command works in mysql.

Suppose I have a table:

 mysql> select recordID, IPAddress, date, httpMethod from Log_Analysis_Records_dalhousieShort; +----------+-----------------+---------------------+-------------------------------------------------+ | recordID | IPAddress | date | httpMethod | +----------+-----------------+---------------------+-------------------------------------------------+ | 1 | 64.68.88.22 | 2003-07-09 00:00:21 | GET /news/science/cancer.shtml HTTP/1.0 | | 2 | 64.68.88.166 | 2003-07-09 00:00:55 | GET /news/internet/xml.shtml HTTP/1.0 | | 3 | 129.173.177.214 | 2003-07-09 00:01:23 | GET / HTTP/1.1 | | 4 | 129.173.177.214 | 2003-07-09 00:01:23 | GET /include/fcs_style.css HTTP/1.1 | | 5 | 129.173.177.214 | 2003-07-09 00:01:23 | GET /include/main_page.css HTTP/1.1 | | 6 | 129.173.177.214 | 2003-07-09 00:01:23 | GET /images/bigportaltopbanner.gif HTTP/1.1 | | 7 | 129.173.177.214 | 2003-07-09 00:01:23 | GET /images/right_1.jpg HTTP/1.1 | | 8 | 64.68.88.165 | 2003-07-09 00:02:43 | GET /studentservices/responsible.shtml HTTP/1.0 | | 9 | 64.68.88.165 | 2003-07-09 00:02:44 | GET /news/sports/basketball.shtml HTTP/1.0 | | 10 | 64.68.88.34 | 2003-07-09 00:02:46 | GET /news/science/space.shtml HTTP/1.0 | | 11 | 129.173.159.98 | 2003-07-09 00:03:46 | GET / HTTP/1.1 | | 12 | 129.173.159.98 | 2003-07-09 00:03:46 | GET /include/fcs_style.css HTTP/1.1 | | 13 | 129.173.159.98 | 2003-07-09 00:03:46 | GET /include/main_page.css HTTP/1.1 | | 14 | 129.173.159.98 | 2003-07-09 00:03:48 | GET /images/bigportaltopbanner.gif HTTP/1.1 | | 15 | 129.173.159.98 | 2003-07-09 00:03:48 | GET /images/left_1g.jpg HTTP/1.1 | | 16 | 129.173.159.98 | 2003-07-09 00:03:48 | GET /images/webcam.gif HTTP/1.1 | +----------+-----------------+---------------------+-------------------------------------------------+

When I execute this statement, how does it choose which recordID include, since there is a range of recordID that will be correct? Chooses only one that matches?

 mysql> select recordID, IPAddress, date, httpMethod from Log_Analysis_Records_dalhousieShort GROUP BY IPADDRESS; +----------+-----------------+---------------------+-------------------------------------------------+ | recordID | IPAddress | date | httpMethod | +----------+-----------------+---------------------+-------------------------------------------------+ | 11 | 129.173.159.98 | 2003-07-09 00:03:46 | GET / HTTP/1.1 | | 3 | 129.173.177.214 | 2003-07-09 00:01:23 | GET / HTTP/1.1 | | 8 | 64.68.88.165 | 2003-07-09 00:02:43 | GET /studentservices/responsible.shtml HTTP/1.0 | | 2 | 64.68.88.166 | 2003-07-09 00:00:55 | GET /news/internet/xml.shtml HTTP/1.0 | | 1 | 64.68.88.22 | 2003-07-09 00:00:21 | GET /news/science/cancer.shtml HTTP/1.0 | | 10 | 64.68.88.34 | 2003-07-09 00:02:46 | GET /news/science/space.shtml HTTP/1.0 | +----------+-----------------+---------------------+-------------------------------------------------+ 6 rows in set (0.00 sec)

For this table, the values max(date) and min(date) seem logical to me, but I'm confused about how recordID and httpMethod .

Is it safe to use two aggregate functions in one command?

 mysql> select recordID, IPAddress, min(date), max(date), httpMethod from Log_Analysis_Records_dalhousieShort GROUP BY IPADDRESS; +----------+-----------------+---------------------+---------------------+-------------------------------------------------+ | recordID | IPAddress | min(date) | max(date) | httpMethod | +----------+-----------------+---------------------+---------------------+-------------------------------------------------+ | 11 | 129.173.159.98 | 2003-07-09 00:03:46 | 2003-07-09 00:03:48 | GET / HTTP/1.1 | | 3 | 129.173.177.214 | 2003-07-09 00:01:23 | 2003-07-09 00:01:23 | GET / HTTP/1.1 | | 8 | 64.68.88.165 | 2003-07-09 00:02:43 | 2003-07-09 00:02:44 | GET /studentservices/responsible.shtml HTTP/1.0 | | 2 | 64.68.88.166 | 2003-07-09 00:00:55 | 2003-07-09 00:00:55 | GET /news/internet/xml.shtml HTTP/1.0 | | 1 | 64.68.88.22 | 2003-07-09 00:00:21 | 2003-07-09 00:00:21 | GET /news/science/cancer.shtml HTTP/1.0 | | 10 | 64.68.88.34 | 2003-07-09 00:02:46 | 2003-07-09 00:02:46 | GET /news/science/space.shtml HTTP/1.0 | +----------+-----------------+---------------------+---------------------+-------------------------------------------------+ 6 rows in set (0.00 sec)

+7

mysql group-by

sixtyfootersdude Nov 14 '10 at 17:57

source share

4 answers

Since I'm a newbie, I cannot post useful images, so I will try to do this with text ...

I just tested this, and it seems that the values of fields that are NOT in GROUP BY will use the values of the FIRST string, which matches the group by condition. It will also explain the perceived “randomness” that others have experienced when choosing columns that are not part of the group by clause.

Example:

Create a table called "test" with two columns named "col1" and "col2" with data that looks like this:

Col1 col2
12
12
thirteen
2 1
2 2
2 3
3 1
3 2
3 3

Then run the following query:

select col1, col2

col2 desc

You will get this result:

thirteen
2 3
3 3
12
12
2 2
3 2
2 1
3 1

Now consider the following query:

select groupTable.col1, groupTable.col2
from (
select col1, col2
from the test
col2 order desc
) groupTable
group by groupTable.col1
order by groupTable.col1 desc

You will get this result:

3 3
2 3
thirteen

Change the subquery to asc:

select col1, col2
from the test
order by col2 asc

Result:

2 1
3 1
12
12
2 2
3 2
thirteen
2 3
3 3

Use this again as the basis for your subquery:

select groupTable.col1, groupTable.col2
from (
select col1, col2
from the test
col2 asc order
) groupTable
group by groupTable.col1
order by groupTable.col1 desc

Result:
3 1
2 1
12

Now you should be able to see how the order of the subquery affects what values are selected for the selected fields, but not in the group by section. This explains the perceived “randomness” that others talked about, because if the subquery (or is missing there) is not combined with the ORDER BY clause, then mysql will capture the rows as they arrive, but having determined the sort order in the subquery that you can control this behavior and get predictable results.

+4

Nick eiden Jul 17 '12 at 19:31

source share

I thought it takes the first row according to PRIMARY KEY or any INDEX, because it looks like it works that way, but I tried GROUP BY query on different tables and did not identify any pattern.

Therefore, I will avoid using any value of non-group columns.

0

Mario Nov 14 '10 at 18:53

source share

Group: selects the first record based on the index. Let's say the table Log_Analysis_Records_dalhousieShort recoedID as an index. Therefore, the group selected 11 recordID for IPAddress 129.173.159.98 among recordID from 11 to 16. However, min and max are pre-group operations so that the values are logically calculated for you.

 mysql> select recordID, IPAddress, date, httpMethod from Log_Analysis_Records_dalhousieShort GROUP BY IPADDRESS; +----------+-----------------+---------------------+-------------------------------------------------+ | recordID | IPAddress | date | httpMethod | +----------+-----------------+---------------------+-------------------------------------------------+ | 11 | 129.173.159.98 | 2003-07-09 00:03:46 | GET / HTTP/1.1 | | 3 | 129.173.177.214 | 2003-07-09 00:01:23 | GET / HTTP/1.1 | | 8 | 64.68.88.165 | 2003-07-09 00:02:43 | GET /studentservices/responsible.shtml HTTP/1.0 | | 2 | 64.68.88.166 | 2003-07-09 00:00:55 | GET /news/internet/xml.shtml HTTP/1.0 | | 1 | 64.68.88.22 | 2003-07-09 00:00:21 | GET /news/science/cancer.shtml HTTP/1.0 | | 10 | 64.68.88.34 | 2003-07-09 00:02:46 | GET /news/science/space.shtml HTTP/1.0 | +----------+-----------------+---------------------+-------------------------------------------------+ 6 rows in set (0.00 sec)

0

ni30rocks Nov 27 '15 at 8:09

source share

AndreKR · Accepted Answer · 2010-11-14T18:00:42+0000

Typically, using GROUP BY when enumerating a field in a select expression without an aggregate function is invalid SQL and should cause an error.

MySQL, however, allows this and simply selects a single value randomly. Try to avoid this because it is confusing.

To prohibit this, you can say at runtime:

SET sql_mode := CONCAT('ONLY_FULL_GROUP_BY,',@@sql_mode);

or use the sql-mode configuration parameter and / or command line.

Yes, listing two aggregate functions is completely valid.

MySQL: how groupby works with columns without aggregate functions?

More articles: