MYSQL shows invalid rows when using GROUP BY

I have two tables:

article('id', 'ticket_id', 'incoming_time', 'to', 'from', 'message') ticket('id', 'queue_id') 

where tickets represent the flow of email messages between support staff and customers, and articles are the individual messages that make up the flow.

I am looking to find the article with the highest inbound time (expressed as a unix timestamp) for each ticket_id, and this is the query I am currently using:

 SELECT article.* , MAX(article.incoming_time) as maxtime FROM ticket, article WHERE ticket.id = article.ticket_id AND ticket.queue_id = 1 GROUP BY article.ticket_id 

For example,

 :article: id --- ticket_id --- incoming_time --- to ------- from ------- message -------- 11 1 1234567 help@ client@ I need help... 12 1 1235433 client@ help@ How can we help? 13 1 1240321 help@ client@ Want food! ... :ticket: id --- queue_id 1 1 ... 

But the result looks like the string with the lowest article ID, instead of what I'm looking for, this is the article with the highest entry time.

Any advice would be greatly appreciated!

+4
max mysql select greatest-n-per-group group-by
source share
2 answers

This is the classic hurdle most MySQL programmers face.

  • You have a ticket_id column, which is a GROUP BY argument. The individual values ​​in this column define the groups.
  • You have an incoming_time column, which is the argument to MAX() . The largest value in this column by row in each group is returned as the value of MAX() .
  • You have all the other columns in the table. The values ​​returned for these columns are arbitrary, not from the same row as the MAX() value.

The database cannot indicate that you need values ​​from the same row where the maximum value takes place.

Consider the following cases:

  • There are several lines in which the same maximum value occurs. Which row should be used to display article.* Columns?

  • You are writing a query that returns both MIN() and MAX() . This is legal, but which line should article.* Show?

     SELECT article.* , MIN(article.incoming_time), MAX(article.incoming_time) FROM ticket, article WHERE ticket.id = article.ticket_id AND ticket.queue_id = 1 GROUP BY article.ticket_id 
  • You are using an aggregate function such as AVG() or SUM() , where not a single line has this value. How does the database guess which row to display?

     SELECT article.* , AVG(article.incoming_time) FROM ticket, article WHERE ticket.id = article.ticket_id AND ticket.queue_id = 1 GROUP BY article.ticket_id 

In most brands of the database, as well as the SQL standard itself, you are not allowed to write such a query because of ambiguity. You cannot include a single column in a selection list that is not inside an aggregate function or named in a GROUP BY .

MySQL is more permissive. It allows you to do this and leaves you the opportunity to write queries without ambiguity. If you have ambiguity, it selects the values ​​from the row that is physically the first in the group (but this depends on the storage mechanism).

For what it's worth, SQLite also has this behavior, but it selects the last row in the group to eliminate ambiguity. Go figure. If the SQL standard does not say what to do, it depends on the implementation of the provider.

Here is a request that may solve your problem for you:

 SELECT a1.* , a1.incoming_time AS maxtime FROM ticket t JOIN article a1 ON (t.id = a1.ticket_id) LEFT OUTER JOIN article a2 ON (t.id = a2.ticket_id AND a1.incoming_time < a2.incoming_time) WHERE t.queue_id = 1 AND a2.ticket_id IS NULL; 

In other words, find line ( a1 ) for which there is no other line ( a2 ) with the same ticket_id and large incoming_time . If no more incoming_time found, the LEFT OUTER JOIN returns NULL instead of a match.

+16
source share
 SELECT a1.* FROM article a1 JOIN (SELECT MAX(a2.incoming_time) AS maxtime FROM article a2 JOIN ticket ON (a2.ticketid=ticket.id) WHERE ticket.queue_id=1) xx ON (a1.incoming_time=xx.maxtime); 
+3
source share

All Articles