Is it possible to optimize this simple SQL query?

I have the following query:

SELECT COUNT(*) FROM Address adr INNER JOIN Audit a on adr.UniqueId = a.UniqueId 
  • in the database (1.3 million addresses, more than 4 million audits).
  • Both UniqueId columns are clustered primary keys

The request takes a lot of time. I feel stupid, but is there any way to optimize it? I want to count all the address records that have a base audience.

EDIT : all of your inputs are much appreciated, here are a few details:

  • The request will not be executed often (this is only for verification), but thanks for the index review, I will add this as far as I know.
  • All addresses have a 1-to-1 audit-related address. Not all audits are addresses.
  • The task takes more than 1 minute. I find it too long for a simple count.
+7
sql sql-server
source share
8 answers

Since you have two datasets ordered by the same value .. have you tried merging join instead of nested loop join?

 SET STATISTICS IO ON SET STATISTICS TIME ON SELECT COUNT(*) FROM Address adr INNER JOIN Auditable a on adr.UniqueId = a.UniqueId OPTION (LOOP JOIN) SELECT COUNT(*) FROM Address adr INNER JOIN Auditable a on adr.UniqueId = a.UniqueId OPTION (MERGE JOIN) SELECT COUNT(*) FROM Address adr INNER JOIN Auditable a on adr.UniqueId = a.UniqueId OPTION (HASH JOIN) 

Edit:

These explanations are conceptual. SQL Server can perform more complex operations than my examples show. This conceptual understanding, comparable to time measurement and logical IO using SET STATISTICS commands, and consideration of query execution plans - forms the basis of my query optimization method (grown over four years). May he serve you as well as he.

Customization

  • Get 5 decks of cards.
  • Take 1 deck and create a parent dataset.
  • Take the remaining 4 decks and create a child dataset.
  • Order each dataset by map value.
  • Let m be the number of cards in the parent dataset.
  • Let n be the number of cards in the set of child data.

Nestedloop

  • Take the card from the top of the parent dataset.
  • Search (using binary search) in the set of child data for the first match of the match.
  • Look forward in the dataset of the child from the first match until a mismatch is found. You have found all matches.
  • Repeat this for each card in the parent dataset.

The nested loop algorithm iterates over the parent dataset, and then searches the dataset once for each parent, which makes its cost: m * log (n)

Combine

  • Take the card from the top of the parent dataset.
  • Take the card from the top of the child dataset.
  • If the cards match, remove the cards from the top of each deck until a mismatch is found. Create each matching pair between parent and child matches.
  • If the cards do not match, find less between the parent and child cards and remove the card from the top of this dataset.

The merge combining algorithm iterates the parent dataset once, and the child data is set once, which makes it cost: m + n. It relies on data to be ordered. If you ask to join the association for data not ordered, you will incur an order operation! This results in a cost of (m * log (m)) + (n * log (n)) + m + n. Even in some cases, this may be better than a nested loop.

Hash

  • Get a card table.
  • Take each card from the parent dataset and place it on the card table where you can find it (it does not have to be related to the cost of the card, you just need to be convenient for you).
  • Take each card from the data set for the children, find the corresponding column on the cardboard table and create the appropriate pair.

The hash join algorithm iterates the parent dataset once, and the child data is set once, which makes it cost: m + n. It relies on having a sufficiently large card table to store the entire contents of the parent dataset.

+11
source share

If you run this query frequently and it should be very fast, create a materialized indexed view. There will be small overhead in INSERT / UPDATE / DELETE, but this query will be approximately instantaneous. Aggregates can be pre-computed and stored in the index to minimize costly calculations during query execution.

Improving Performance with SQL Server 2005 Indexed Views

+6
source share

The real problem is combining nested loops. For each 1.4 million rows in the Address table, you make a Seek index in the Auditble table. This means that the 1.4M root block, branching block and leaf block are read for reading in 4.2M blocks. The whole index is probably only 5K blocks or so ... it should have a hash join, so it reads both indexes once and hashes through them.

If you think these tables are large, I assume that it is on a small box without a lot of memory. You need to make sure that you have enough memory designed to set the entire index into memory to make the hash connection efficient.

+2
source share

Is Auditable.UniqueID a foreign key reference to Address.UniqueID, which means that Auditable does not have values โ€‹โ€‹that also do not exist in the address?

If so, this may work and may be faster:

 SELECT COUNT(DISTINCT Auditable.UniqueID) FROM Auditable 

Note. This also assumes that UniqueID is unique (/ primary key) in the address table, but not unique in the Auditable table

+1
source share

EXISTS offer is cheaper than INNER JOIN.

 select COUNT(adr.UniqueId) from Addresses adr where EXISTS ( select 1 from Auditables aud where aud.UniqueId = adr.UniqueId ) 

Does this fit your need?

NB Guides are very expensive for a database engine.

+1
source share

Not sure if this will be faster, but you can try the following

 SELECT COUNT(adr.UniqueID) FROM Address adr INNER JOIN Auditable a on adr.UniqueId = a.UniqueId 

It should give you the same score, because unqieieid will never be null.

0
source share

There is no pointer to a foreign key, I would say.

  • 1.4 million and 4 million are not large tables, they are small. Tell me, if you go through 500 million entries, please.

  • For a real answer, we need a plan for the execution plan / request, so we can see what is happening.

  • And it would be nice to know what โ€œLongโ€ is in your world (given that you think there are a lot of 4 million lines). This question will not be answered in 1 second - so what do you expect and what happens?

  • I am sure you have a missing index. Short, I would start pointing to hardware (because I also saw this as a reason for the effectiveness of crap).

0
source share

For large tables such as these, you can split your data to improve query performance. Also, if you haven't done so already, try running the Tuning Advisor to find out if there are any additional indexes that might be useful. Also, you recently reorganized your cluster indexes - is this a task that is part of the maintanence package? Many times this will greatly improve your productivity.

0
source share

All Articles