Consider this difference in SQL performance, where in the first I select all 26,000 rows, and in the second - only the first 5.
SELECT tw.*
FROM entity e
JOIN entity_tag et on et.entity_id = e.id
JOIN tag t on t.tag_id = et.tag_id
JOIN tagrelatedtweets trt on trt.FK_Tag_ID = t.tag_id
JOIN tweets tw on tw.PK_Tweet_ID = trt.FK_Tweet_ID
WHERE e.id = 765131
ORDER BY tw.[timestamp]
against
SELECT TOP (5) tw.*
FROM entity e
JOIN entity_tag et on et.entity_id = e.id
JOIN tag t on t.tag_id = et.tag_id
JOIN tagrelatedtweets trt on trt.FK_Tag_ID = t.tag_id
JOIN tweets tw on tw.PK_Tweet_ID = trt.FK_Tweet_ID
WHERE e.id = 765131
ORDER BY tw.[timestamp]
Without: CPU = 201 | Reads: 6880 | Writes: 0 | Duration: 451
With: CPU = 302439 | Reads: 7453199 | Writes: 3169 | Duration: 74188
It just doesn't make sense to me ... Is there any other way to do this?
After Martin suggested REBUILD STATISTICS in all tables, there is a slight improvement, but the trick with changing the number of TOPs in a parameter works best.
Before rebuilding statistics:
CPU = 302439 | Reads: 7453199 | Writes: 3169 | Duration: 74188
After recovering statistics:
CPU = 127734 | Reads: 4100436 | Writes: 2656 | Duration: 16880
With parameter:
CPU = 218 | Reads: 6899 | Writes: 0 | Duration: 83
Request with parameter:
DECLARE @TOP INT; SET @TOP=5;
SELECT TOP (@TOP) tw.*
FROM entity e
JOIN entity_tag et on et.entity_id = e.id
JOIN tag t on t.tag_id = et.tag_id
JOIN tagrelatedtweets trt on trt.FK_Tag_ID = t.tag_id
JOIN tweets tw on tw.PK_Tweet_ID = trt.FK_Tweet_ID
WHERE e.id = 765131
ORDER BY tw.timestamp desc
A final note for those of you who use the Entity Framework; if you experience this behavior, you can simulate the same behavior based on the parameters as follows:
.Take(100).ToList().Take(5)
, , , framework, .
, !