Why one query is extremely slow, but an identical query on a similar table works in no time

Question

Why one query is extremely slow, but an identical query on a similar table works in no time

I have this query ... which works very slowly (almost a minute):

select distinct main.PrimeId from PRIME main join ( select distinct p.PrimeId from PRIME p left outer join ATTRGROUP a on p.PrimeId = a.PrimeId or p.PrimeId = a.RelatedPrimeId where a.PrimeId is not null and a.RelatedPrimeId is not null ) mem on main.PrimeId = mem.PrimeId

The PRIME table has 18k rows and has PK on PrimeId.

The ATTRGROUP table has 24k rows and has a composite PK on PrimeId, col2, then RelatedPrimeId, and then cols 4-7. There is also a separate pointer to RelatedPrimeId.

The query ultimately returns 8.5k rows — various PrimeId values in the PRIME table that match either PrimeId or RelatedPrimeId in the ATTRGROUP table

I have the same query using ATTRADDRESS instead of ATTRGROUP. ATTRADDRESS has an identical key and index structure like ATTRGROUP. It has only 11 thousand lines on it, which, admittedly, is less, but in this case the request is executed in about a second and returns 11 thousand lines.

So my question is:

How can a query be slower on one table than another, even though the structures are identical.

So far I have tried this on SQL 2005 and (using the same database, updated) SQL 2008 R2. The two of us independently obtained the same results by restoring the same backup on two different computers.

Other information:

a bit inside the brackets is executed in less than a second, even in a slow query
In terms of implementation, there is a possible clue that I do not understand. Here is a part of it, with a suspicious transaction of 320 million lines:

enter image description here

However, the actual number of rows in this table is a little more than 24k, not 320M!

If I reorganize the request part inside the brackets, so that it uses UNION, not OR, this way:

 select distinct main.PrimeId from PRIME main join ( select distinct p.PrimeId from PRIME p left outer join ATTRGROUP a on p.PrimeId = a.PrimeId where a.PrimeId is not null and a.RelatedPrimeId is not null UNION select distinct p.PrimeId from PRIME p left outer join ATTRGROUP a on p.PrimeId = a.RelatedPrimeId where a.PrimeId is not null and a.RelatedPrimeId is not null ) mem on main.PrimeId = mem.PrimeId

... then a slow request takes less than a second.

I am very grateful for your understanding! Let me know if you need more information and I will update the question. Thanks!

By the way, I understand that in this example there is a redundant connection. This is not easy to remove, because in production all this is generated dynamically, and the bit in brackets takes many different forms.

Edit

I rebuilt the indexes on ATTRGROUP, there are no significant differences.

Edit 2 :

If I use a temporary table this way:

 select distinct p.PrimeId into #temp from PRIME p left outer join ATTRGROUP a on p.PrimeId = a.PrimeId or p.PrimeId = a.RelatedPrimeId where a.PrimeId is not null and a.RelatedPrimeId is not null select distinct main.PrimeId from Prime main join #temp mem on main.PrimeId = mem.PrimeId

... again, even with OR in the original OUTER JOIN, it works in less than a second. I hate temporary tables like this, because it always feels like a defeat, so I won’t use refactoring, but I thought it was interesting that it mattered.

Edit 3 :

Updating statistics also does not matter.

Thanks for all your suggestions.

+8

sql-server query-performance

Chrisa Aug 12 '11 at 7:55

source share

4 answers

Willmckill · Answer 1 · 2011-08-12T08:51:33+0000

In my experience, it's better to use two left joins rather than OR in a JOIN clause. Therefore, instead of:

  left outer join ATTRGROUP a on p.PrimeId = a.PrimeId or p.PrimeId = a.RelatedPrimeId

I would suggest:

  left outer join ATTRGROUP a on p.PrimeId = a.PrimeId left outer join ATTRGROUP a2 on p.PrimeId = a2.RelatedPrimeId

Daan remmers · Answer 2 · 2011-11-25T09:26:04+0000

I noticed that the main query does not correlate with the subquery:

 select distinct main.PrimeId from PRIME main join ( select distinct p.PrimeId from PRIME p left outer join ATTRGROUP a on p.PrimeId = a.PrimeId where *main.PrimeId = a.PrimeId* UNION select distinct p.PrimeId from PRIME p left outer join ATTRGROUP a on p.PrimeId = a.RelatedPrimeId where *main.PrimeId = a.PrimeId* ) mem on main.PrimeId = mem.PrimeId

In this construct, you also do not need to use the 'is not null' clause (will you need this because the main key will never contain a null value?).

I was taught to avoid OR-constructions (as was already recommended by others), but also to avoid a "non-null" or "in a valualistic" construct. Basically, they can be replaced with the expression (NOT) EXISTS.

AK · Answer 3 · 2011-08-12T14:54:55+0000

This is not a direct answer, but if you have FK restrictions that link to ATTRGROUP.PrimeId and ATTRGROUP.RelatedPrimeId to the main, then your request is equivalent to this much simpler:

 select PrimeId from ATTRGROUP a union select RelatedPrimeId from ATTRGROUP a

Hlgem · Answer 4 · 2011-08-12T14:59:52+0000

One of the reasons why one query can be much slower on one table than another is because the statistics in this table are outdated and choose the wrong query plan.

However, I support refactoring, which gets rid of the sentence or sentence that others have proposed anyway.

Why one query is extremely slow, but an identical query on a similar table works in no time

More articles: