Is it possible to optimize this simple SQL query?

Question

Is it possible to optimize this simple SQL query?

I have the following query:

SELECT COUNT(*) FROM Address adr INNER JOIN Audit a on adr.UniqueId = a.UniqueId

in the database (1.3 million addresses, more than 4 million audits).
Both UniqueId columns are clustered primary keys

The request takes a lot of time. I feel stupid, but is there any way to optimize it? I want to count all the address records that have a base audience.

EDIT : all of your inputs are much appreciated, here are a few details:

The request will not be executed often (this is only for verification), but thanks for the index review, I will add this as far as I know.
All addresses have a 1-to-1 audit-related address. Not all audits are addresses.
The task takes more than 1 minute. I find it too long for a simple count.

+7

sql sql-server

ibiza May 12, '10 at 13:15

source share

8 answers

If you run this query frequently and it should be very fast, create a materialized indexed view. There will be small overhead in INSERT / UPDATE / DELETE, but this query will be approximately instantaneous. Aggregates can be pre-computed and stored in the index to minimize costly calculations during query execution.

Improving Performance with SQL Server 2005 Indexed Views

+6

KM. May 12, '10 at 13:22

source share

The real problem is combining nested loops. For each 1.4 million rows in the Address table, you make a Seek index in the Auditble table. This means that the 1.4M root block, branching block and leaf block are read for reading in 4.2M blocks. The whole index is probably only 5K blocks or so ... it should have a hash join, so it reads both indexes once and hashes through them.

If you think these tables are large, I assume that it is on a small box without a lot of memory. You need to make sure that you have enough memory designed to set the entire index into memory to make the hash connection efficient.

+2

Stephanie page May 12, '10 at 15:18

source share

Is Auditable.UniqueID a foreign key reference to Address.UniqueID, which means that Auditable does not have values that also do not exist in the address?

If so, this may work and may be faster:

 SELECT COUNT(DISTINCT Auditable.UniqueID) FROM Auditable

Note. This also assumes that UniqueID is unique (/ primary key) in the address table, but not unique in the Auditable table

+1

Chris shaffer May 12, '10 at 13:22

source share

EXISTS offer is cheaper than INNER JOIN.

 select COUNT(adr.UniqueId) from Addresses adr where EXISTS ( select 1 from Auditables aud where aud.UniqueId = adr.UniqueId )

Does this fit your need?

NB Guides are very expensive for a database engine.

+1

Will marcouiller May 12, '10 at 15:25

source share

Not sure if this will be faster, but you can try the following

 SELECT COUNT(adr.UniqueID) FROM Address adr INNER JOIN Auditable a on adr.UniqueId = a.UniqueId

It should give you the same score, because unqieieid will never be null.

0

Ben robinson May 12, '10 at 13:20

source share

There is no pointer to a foreign key, I would say.

1.4 million and 4 million are not large tables, they are small. Tell me, if you go through 500 million entries, please.
For a real answer, we need a plan for the execution plan / request, so we can see what is happening.
And it would be nice to know what “Long” is in your world (given that you think there are a lot of 4 million lines). This question will not be answered in 1 second - so what do you expect and what happens?
I am sure you have a missing index. Short, I would start pointing to hardware (because I also saw this as a reason for the effectiveness of crap).

0

Tomtom May 12, '10 at 13:29

source share

For large tables such as these, you can split your data to improve query performance. Also, if you haven't done so already, try running the Tuning Advisor to find out if there are any additional indexes that might be useful. Also, you recently reorganized your cluster indexes - is this a task that is part of the maintanence package? Many times this will greatly improve your productivity.

0

Sidefx May 12, '10 at 13:31

source share

Amy b · Accepted Answer · 2010-05-12T15:33:53+0000

Since you have two datasets ordered by the same value .. have you tried merging join instead of nested loop join?

 SET STATISTICS IO ON SET STATISTICS TIME ON SELECT COUNT(*) FROM Address adr INNER JOIN Auditable a on adr.UniqueId = a.UniqueId OPTION (LOOP JOIN) SELECT COUNT(*) FROM Address adr INNER JOIN Auditable a on adr.UniqueId = a.UniqueId OPTION (MERGE JOIN) SELECT COUNT(*) FROM Address adr INNER JOIN Auditable a on adr.UniqueId = a.UniqueId OPTION (HASH JOIN)

Edit:

These explanations are conceptual. SQL Server can perform more complex operations than my examples show. This conceptual understanding, comparable to time measurement and logical IO using SET STATISTICS commands, and consideration of query execution plans - forms the basis of my query optimization method (grown over four years). May he serve you as well as he.

Customization

Get 5 decks of cards.
Take 1 deck and create a parent dataset.
Take the remaining 4 decks and create a child dataset.
Order each dataset by map value.
Let m be the number of cards in the parent dataset.
Let n be the number of cards in the set of child data.

Nestedloop

Take the card from the top of the parent dataset.
Search (using binary search) in the set of child data for the first match of the match.
Look forward in the dataset of the child from the first match until a mismatch is found. You have found all matches.
Repeat this for each card in the parent dataset.

The nested loop algorithm iterates over the parent dataset, and then searches the dataset once for each parent, which makes its cost: m * log (n)

Combine

Take the card from the top of the parent dataset.
Take the card from the top of the child dataset.
If the cards match, remove the cards from the top of each deck until a mismatch is found. Create each matching pair between parent and child matches.
If the cards do not match, find less between the parent and child cards and remove the card from the top of this dataset.

The merge combining algorithm iterates the parent dataset once, and the child data is set once, which makes it cost: m + n. It relies on data to be ordered. If you ask to join the association for data not ordered, you will incur an order operation! This results in a cost of (m * log (m)) + (n * log (n)) + m + n. Even in some cases, this may be better than a nested loop.

Hash

Get a card table.
Take each card from the parent dataset and place it on the card table where you can find it (it does not have to be related to the cost of the card, you just need to be convenient for you).
Take each card from the data set for the children, find the corresponding column on the cardboard table and create the appropriate pair.

The hash join algorithm iterates the parent dataset once, and the child data is set once, which makes it cost: m + n. It relies on having a sufficiently large card table to store the entire contents of the parent dataset.

Is it possible to optimize this simple SQL query?

More articles: