Are these two queries the same - GROUP BY or DISTINCT?

Question

Are these two queries the same - GROUP BY or DISTINCT?

It seems that these two queries return the same results. Is this a coincidence or are they really the same?

one.

SELECT t.ItemNumber, (SELECT TOP 1 ItemDescription FROM Transactions WHERE ItemNumber = t.ItemNumber ORDER BY DateCreated DESC) AS ItemDescription FROM Transactions t GROUP BY t.ItemNumber

2.

 SELECT DISTINCT(t.ItemNumber), (SELECT TOP 1 ItemDescription FROM Transactions WHERE ItemNumber = t.ItemNumber ORDER BY DateCreated DESC) AS ItemDescription FROM Transactions t

A bit of explanation: I'm trying to get a great list of items from a table full of transactions. For each item, I am looking for an ItemNumber (identification field) and the last ItemDescription.

+7

sql sql-server sql-server-2008 group-by

Mcs Jul 28 '10 at 15:11

source share

8 answers

Same results, but the second seems to have a more expensive sorting step to apply DISTINCT to my quick test.

Both were driven out of sight by ROW_NUMBER , though ...

 with T as ( SELECT ItemNumber, ItemDescription, ROW_NUMBER() OVER ( PARTITION BY ItemNumber ORDER BY DateCreated DESC) AS RN FROM Transactions ) SELECT * FROM T WHERE RN=1

edit ... which, in turn, was knocked over by the Joe solution in my test setup.

Plans http://img842.imageshack.us/img842/4105/executionplan.png

Test setup

 CREATE TABLE Transactions ( ItemNumber INT not null, ItemDescription VARCHAR(50) not null, DateCreated DATETIME not null ) INSERT INTO Transactions SELECT number, NEWID(),DATEADD(day, cast(rand(CAST(newid() as varbinary))*10000 as int),getdate()) FROM master.dbo.spt_values ALTER TABLE dbo.Transactions ADD CONSTRAINT PK_Transactions PRIMARY KEY CLUSTERED (ItemNumber,DateCreated)

+4

Martin smith Jul 28 '10 at 15:42

source share

If you've been working since at least 2005 and can use CTE , it's a little cleaner than IMHO.

EDIT: As stated in Martin's Answer , this also works much better.

 ;with cteMaxDate as ( select t.ItemNumber, max(DateCreated) as MaxDate from Transactions t group by t.ItemNumber ) SELECT t.ItemNumber, t.ItemDescription FROM cteMaxDate md inner join Transactions t on md.ItemNumber = t.ItemNumber and md.MaxDate = t.DateCreated

+3

Joe stefanelli Jul 28 '10 at 15:25

source share

Based on data and simple queries, both will return the same results. However, the basic operations are very different.

DISTINCT , since AakashM beat me up, indicating it applies to all column values, including values from subqueries and computed columns. All DISTINCT removes duplicates based on all columns from visibility . That's why it is usually considered hacking because people will use it to get rid of duplicates without understanding why the query returns them in the first place (because they should use IN or EXISTS , and not as a union, usually). PostgreSQL is the only database I know with the DISTINCT ON , which really works as possible.

Sentence

A GROUP BY is different - its main use is for grouping to accurately use an aggregate function. For a server function, the column values will be unique values based on what is defined in the GROUP BY clause. This query will never be needed by DISTINCT, because the values of interest are already unique.

Conclusion

This is a bad example because it displays DISTINCT and GROUP BY as equal if they are not.

+3

OMG Ponies Jul 28 '10 at 16:29

source share

Yes, they will return the same results.

+2

Mike M. Jul 28 '10 at 15:14

source share

Since you are not using any aggregate functions, SQL Server must be smart enough to treat GROUP BY as DISTINCT .

You may also be interested in checking the following stack overflow message for further reading on this topic:

Is there a difference between Group By and Distinct?

+2

Daniel Vassallo Jul 28 '10 at 15:15

source share

GROUP BY necessary for the correct return of results when using aggregate functions in an sql query. Since you are not using an aggregated function, there is no need for GROUP BY , and therefore the queries are the same.

+1

pkananen Jul 28 '10 at 15:17

source share

Yes, they return the same results.

Usually the group by clause (found here ) groups the rows by the specified column, so if you have a sum in a select expression, So if you have a table like:

 O_Id OrderDate OrderPrice Customer 1 2008/11/12 1000 Hansen 2 2008/10/23 1600 Nilsen 3 2008/09/02 700 Hansen 4 2008/09/03 300 Hansen 5 2008/08/30 2000 Jensen 6 2008/10/04 100 Nilsen

If you group by customer and request the amount or price of the order, you will receive

 Customer SUM(OrderPrice) Hansen 2000 Nilsen 1700 Jensen 2000

In contrast, a separate one (found here ) just makes sure you don't have duplicate rows. In this case, the original table would remain the same, since each row is different from the others.

+1

Kyra Jul 28 '10 at 15:20

source share

Aakashm · Accepted Answer · 2010-07-28T15:31:42+0000

Your example # 2 made me scratch my head a bit - I thought to myself: "You can't create a single column DISTINCT , what would it mean?" - until I understood what was happening.

If you

 SELECT DISTINCT(t.ItemNumber)

you are not, in spite of appearance, really requesting excellent t.ItemNumber values! Your example # 2 is actually parsed just like

 SELECT DISTINCT (t.ItemNumber) , (SELECT TOP 1 ItemDescription FROM Transactions WHERE ItemNumber = t.ItemNumber ORDER BY DateCreated DESC) AS ItemDescription FROM Transactions t

with syntactically correct but extra parentheses around t.ItemNumber . It is to the result set that DISTINCT is generally applied.

In this case, since your GROUP BY groups are column by column, which actually differ, you get the same results. Actually, I'm a little surprised that SQL Server does not (in the GROUP BY example) insist that the heading of the subclasses is mentioned in the GROUP BY list.

Are these two queries the same - GROUP BY or DISTINCT?

Conclusion

More articles: