T-SQL MERGE Performance in a Typical Publishing Context

I have a situation where the β€œpublisher” application essentially saves the updated view model, requesting a VERY complex view, and then combines the results into a denormalized view table table using separate insert, update and delete operations.

Now that we have upgraded to SQL 2008, I decided it would be great to update them using the SQL MERGE statement. However, after writing the request, the cost of the subtree of the MERGE operator is 1214.54! In the old way, the sum of Insert / Update / Delete was only 0.104 !!

I cannot understand how a simpler way of describing the same exact operation can be so complicated. Perhaps you can see a mistake in my ways where I cannot.

Some statistics on the table: it has 1.9 million rows, and each MERGE operation inserts, updates, or deletes more than 100 of them. In my test case, only 1 is affected.

-- This table variable has the EXACT same structure as the published table -- Yes, I've tried a temp table instead of a table variable, and it makes no difference declare @tSource table ( Key1 uniqueidentifier NOT NULL, Key2 int NOT NULL, Data1 datetime NOT NULL, Data2 datetime, Data3 varchar(255) NOT NULL, PRIMARY KEY ( Key1, Key2 ) ) -- Fill the temp table with the desired current state of the view model, for -- only those rows affected by @Key1. I'm not really concerned about the -- performance of this. The result of this; it already good. This results -- in very few rows in the table var, in fact, only 1 in my test case insert into @tSource select * from vw_Source_View with (nolock) where Key1 = @Key1 -- Now it time to merge @tSource into TargetTable ;MERGE TargetTable as T USING tSource S on S.Key1 = T.Key1 and S.Key2 = T.Key2 -- Only update if the Data columns do not match WHEN MATCHED AND T.Data1 <> S.Data1 OR T.Data2 <> S.Data2 OR T.Data3 <> S.Data3 THEN UPDATE SET T.Data1 = S.Data1, T.Data2 = S.Data2, T.Data3 = S.Data3 -- Insert when missing in the target WHEN NOT MATCHED BY TARGET THEN INSERT (Key1, Key2, Data1, Data2, Data3) VALUES (Key1, Key2, Data1, Data2, Data3) -- Delete when missing in the source, being careful not to delete the REST -- of the table by applying the T.Key1 = @id condition WHEN NOT MATCHED BY SOURCE AND T.Key1 = @id THEN DELETE ; 

So how does this get up to 1200 subtrees? Accessing data from the tables themselves seems pretty effective. In fact, 87% of the value of MERGE seems to relate to the Sort operation near the end of the chain:

MERGE (0%) <- Index update (12%) <- Sort (87%) <- (...)

And in this way there are 0 lines that enter and exit it. Why is 87% of resources required to sort 0 rows?

UPDATE

I published an actual ( unappraised ) execution plan for the MERGE operation in Gist only.

+1
source share
1 answer

Subtitle costs should be taken with a lot of salt (and especially when you have huge power errors). SET STATISTICS IO ON; SET STATISTICS TIME ON; output is the best indicator of actual performance.

Zero-row sorting does not take 87% of resources. This problem in your plan is one of the statistical estimates. The costs indicated in the actual plan are still estimated. He does not adjust them to take into account what actually happened.

There is a point in the plan where the filter reduces 1,911,721 lines to 0, but the approximate lines that go forward are 1,860,310. After that, all the costs are fake, culminating in an 87 percent estimate of 3,348,560 lines.

The power estimation error can be reproduced outside the Merge statement by looking at the Full Outer Join evaluation plan with equivalent predicates (giving the same rating of 1,860,310 lines).

 SELECT * FROM TargetTable T FULL OUTER JOIN @tSource S ON S.Key1 = T.Key1 and S.Key2 = T.Key2 WHERE CASE WHEN S.Key1 IS NOT NULL /*Matched by Source*/ THEN CASE WHEN T.Key1 IS NOT NULL /*Matched by Target*/ THEN CASE WHEN [T].[Data1]<>S.[Data1] OR [T].[Data2]<>S.[Data2] OR [T].[Data3]<>S.[Data3] THEN (1) END /*Not Matched by Target*/ ELSE (4) END /*Not Matched by Source*/ ELSE CASE WHEN [T].[Key1]= @id THEN (3) END END IS NOT NULL 

Nevertheless, however, the plan up to the filter itself seems quite appropriate. It performs a scan with a full clustered index when you might need a plan with two clustered indexes. One to get one row consistent with the primary key from the connection at the source, and the other to get the range T.Key1 = @id (although perhaps this is necessary to avoid the need to sort the keys in a clustered order later?)

Original plan

Perhaps you could try this correspondence and see if it works better or worse.

 ;WITH FilteredTarget AS ( SELECT T.* FROM TargetTable AS T WITH (FORCESEEK) JOIN @tSource S ON (T.Key1 = S.Key1 AND S.Key2 = T.Key2) OR T.Key1 = @id ) MERGE FilteredTarget AS T USING @tSource S ON (T.Key1 = S.Key1 AND S.Key2 = T.Key2) -- Only update if the Data columns do not match WHEN MATCHED AND S.Key1 = T.Key1 AND S.Key2 = T.Key2 AND (T.Data1 <> S.Data1 OR T.Data2 <> S.Data2 OR T.Data3 <> S.Data3) THEN UPDATE SET T.Data1 = S.Data1, T.Data2 = S.Data2, T.Data3 = S.Data3 -- Note from original poster: This extra "safety clause" turned out not to -- affect the behavior or the execution plan, so I removed it and it works -- just as well without, but if you find yourself in a similar situation -- you might want to give it a try. -- WHEN MATCHED AND (S.Key1 <> T.Key1 OR S.Key2 <> T.Key2) AND T.Key1 = @id THEN -- DELETE -- Insert when missing in the target WHEN NOT MATCHED BY TARGET THEN INSERT (Key1, Key2, Data1, Data2, Data3) VALUES (Key1, Key2, Data1, Data2, Data3) WHEN NOT MATCHED BY SOURCE AND T.Key1 = @id THEN DELETE; 
+2
source

All Articles