Optimize long running SQL Server query

Question

Optimize long running SQL Server query

I have the following query:

SELECT fpa.scenario_id, fpa.facility_id, cge.CostGroupId result_total_id, mp_surrogate_id, CAST(SUM(fpa.raw_amount * cge.CostSign) AS DECIMAL(25, 13)) result_total_amount INTO ADM_FactProfitTotalAmount_1 FROM #tempAmount fpa JOIN ResultTest cge ON cge.CostId = fpa.process_id WHERE fpa.scenario_id = 1 GROUP BY fpa.scenario_id, fpa.facility_id, cge.CostGroupId, fpa.mp_surrogate_id

In #tempAmount I have 220 million lines.
In ResultTest I have 150 lines.

I have an index on #tempAmount :

 CREATE NONCLUSTERED INDEX #tempAmount_process_id ON #tempAmount(scenario_id, facility_id, mp_surrogate_id, process_id )

It takes about 1 hour to complete. Is it possible to optimize it?

EDIT:

I created an index in the ResultTest CostId column, changed the bit of another index and query

  CREATE CLUSTERED INDEX #tempFactAmount_index ON #tempAmount (process_id ,facility_id, mp_surrogate_id ) SELECT ISNULL(CAST(1 as BIGINT), 0) scenario_id, fpa.facility_id, cge.CostGroupId result_total_id, fpa.mp_surrogate_id, CAST(SUM(fpa.raw_amount * cge.CostSign) AS DECIMAL(25, 13)) result_total_amount INTO ADM_FactProfitTotalAmount_1 FROM ResultTest cge JOIN #tempAmount fpa ON cge.CostId = fpa.process_id GROUP BY fpa.facility_id, fpa.mp_surrogate_id, cge.CostGroupId

Execution plan:

41% paste in ADM_FactProfitTotalAmount_1

51% Hash Match Aggregate

2% Hash Match Inner Join

+4

performance sql sql-server query-optimization

Andriy kuzmych Dec 05 '12 at 11:29

source share

3 answers

Steve ford · Answer 1 · 2012-12-06T11:13:47+0000

In such scenarios, I found that adding sums in a large table before joining a smaller table often helps. Therefore, in this case, I would use the following:

 ;WITH SUMCTE AS ( SELECT fpa.facility_id, fpa.mp_surrogate_id, fpa.process_id, SUM(fpa.raw_amount) AS total_amount FROM #tempAmount fpa GROUP BY fpa.facility_id, fpa.mp_surrogate_id, fpa.process_id ) SELECT CAST(1 as BIGINT) AS Scenario_id, facility_id, cge.CostGroupId result_total_id, mp_surrogate_id, CAST(SUM(SCT.total_amount * cge.CostSign) AS DECIMAL(25, 13)) result_total_amount INTO ADM_FactProfitTotalAmount_1 FROM ResultTest cge JOIN SUMCTE SCT ON cge.CostId = SCT.process_id GROUP BY fpa.facility_id, fpa.mp_surrogate_id, cge.CostGroupId

If there is only one line in the ResulTest in process_id, I would simplify this by deleting the outer group with:

 ;WITH SUMCTE AS ( SELECT fpa.facility_id, fpa.mp_surrogate_id, fpa.process_id, SUM(fpa.raw_amount) AS total_amount FROM #tempAmount fpa GROUP BY fpa.facility_id, fpa.mp_surrogate_id, fpa.process_id ) SELECT CAST(1 as BIGINT) AS Scenario_id, facility_id, cge.CostGroupId result_total_id, mp_surrogate_id, CAST((SCT.total_amount * cge.CostSign) AS DECIMAL(25, 13)) result_total_amount INTO ADM_FactProfitTotalAmount_1 FROM ResultTest cge JOIN SUMCTE SCT ON cge.CostId = SCT.process_id

whunmr · Answer 2 · 2012-12-05T11:43:04+0000

I suggest starting with checking the evaluation plan.
http://msdn.microsoft.com/en-us/library/ms191194.aspx
An index of multiple columns can only be used when it leaves a prefix. http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html
therefore, I suggest moving process_id next to script_id because they are used there and combined.
CREATE CONTINUOUS INDEX # tempAmount_process_id ON #tempAmount (script_id, process_id, object_id, mp_surrogate_id)
last: let the OS cache your disk blocks in memory as much as possible. on linux, before any critical database is put into production, do "cat your_database.store.file> / dev / null". A lot of disk reading will be deleted from the memory cache.

Paul williams · Answer 3 · 2012-12-05T14:05:05+0000

First, I would suggest capturing the actual execution plan. If you are executing a query from SQL Server Management Studio (SSMS), enable the "Enable actual execution plan" option. If this query is started from another program, start SQL Server Profiler and enable Showplan Statistics Profile and / or Showplan XML Statistics Profile. Browse through this profile and see if the query leads, as you would expect.

Do you have a pointer to ResultTest colm CostId? Only 150 rows, scanning the index in this table is not a big deal. If you do not have an index in this table, you can try it.

I wonder if the execution plan executes nested loops to join the ResultTest. If so, then it will be 150 X 220 million = 33 billion operations. Hashing a join or combining a join will work much better if that is the case. You can force a specific connection to the connection hint OPTION (HASH JOIN) or OPTION (MERGE JOIN) . This in itself can make a huge difference.

The index on #tempAmount has many columns that are not needed for a SELECT query. Also, this is the NONCLUSTERED index. Is there also a CLUSTERED index? If not, you can try converting it to CLUSTERED and get rid of other columns. This will reduce the size of the index and work better, because all rows for id_ script will be contiguous.

Optimize long running SQL Server query

More articles: