Avoid repeating a subquery that references multiple joined tables

I have a subquery ( LastActivityOn ) that I would like to use in three places: the SELECTed output clause, ORDER BY and WHERE.

SELECT TOP 175 (SELECT MAX(ActivityDate) FROM (VALUES (UserRegistration.CreatedOn), (UserRegistration.ActivatedOn), (UserRegistration.LastLoginOn), (UserRegistration.UpdatedOn), (UserProfile.LastPostedOn)) AS AllDates(ActivityDate)) LastActivityOn, UserRegistration.FirstName, UserRegistration.LastName, [15 more columns of various calculated distances, coalesces, etc...] FROM UserRegistration INNER JOIN UserProfile ON UserRegistration.Id = UserProfile.RegistrationId INNER JOIN ( SELECT PostalCode, GeoCenter, PrimaryCity, StateOrProvince FROM PostalCodes WHERE @OriginPostalCode IS NULL OR PostalCodes.GeoCenter.STDistance(@OriginPoint) < @WithinMeters ) AS ProximalPostalCodes ON ProximalPostalCodes.PostalCode = UserRegistration.PostalCode [7 more joins including full-text queries] WHERE LastActivityOn > @OldestUserToSearch AND [20 more lines of filtering logic] ORDER BY LOG(DATEDIFF(WEEK, LastActivityOn, @Today))/LOG(2), FullTextRelevance 

Notice the three LastActivityOn events. Also note that the LastActivityOn subquery refers to two tables. I suppose because it depends on the join clause in the parent query, is this essentially a correlated subquery?

When I took a maximum of two dates using the User-Defined-Function function, I was able to use the resulting value in my WHERE and ORDER BY. Now I can’t.

It seems that I have several options ... I could wrap it all in another query, repeating the projection with only added activity. It seems that I can use "WITH" (CTE) in the same way.

But since I do not clearly understand the rules of when I can and cannot use the subquery the way I want, I could easily miss something. Any ideas?

Or maybe the SQL SERVER will be smart enough to perform calculations only once for each output line, and I should not worry about that?

EDIT: SQL Server 2008 Standard is currently in progress, but the upgrade will be fine at some point. In addition, RE: journal function - I work to combine with relevance as a weighted sum, so work is in progress. I either crop it with INT to use it as a ranking type, or add it to fit the linear setting.

CORRECTION: I was able to use the subquery alias in my ORDER BY, but not with any additional calculations or in the where clause. Thanks ypercube for pointing this out.

+4
source share
3 answers

I think turning on this connection can do what I need:

 OUTER APPLY (SELECT MAX(ActivityDate) LastActivityOn FROM (VALUES (UserRegistration.CreatedOn), (UserRegistration.ActivatedOn), (UserRegistration.LastLoginOn), (UserRegistration.UpdatedOn), (UserProfile.PostedOn)) AS AllDates(ActivityDate)) LastActivity 

Also added it as a WHERE conditional criterion, disabling it using the NULL parameter:

 WHERE (@OldestUserToSearch IS NULL OR LastActivityOn > @OldestUserToSearch) AND 

results

The performance of using this and referencing it in SELECT was identical to the SQL Server 2008 subqueries.

When I add that the WHERE predicate is where things start to get hairy. The radius of the search for the zip code that you can see in the original question is the hardest part of the calculation, and it worked best at the top of the search, closest to "TOP 175". Unfortunately, the optimizer moved it 5 levels deeper into the execution plan, where the distance calculation was ultimately performed against many other lines when I reused the "OUTER APPLY" output in several places. As a result, the request lasted about 6 times.

Since the performance was the same for the same form request and also led to less code (without requiring repeating my projection or wrapping the entire request in the CTE or subquery), I am going to call OUTER APPLY the answer I was in in the search. Separately, if I need to force a GIS search in an external nested loop under any circumstances, I will have to reformulate the query for it.

Summary of options presented: How to avoid repeating a computed expression several times in the same selection?

Some useful similar applications for APPLY:

Local examples in subqueries and CTE (which I rejected as answers):

Unrelated / useless articles with related titles:

+1
source

I am not trying to modify your query, but maybe a common table expression is what you need.

+6
source

You cannot use the LastActivityOn alias in the WHERE , but you can use it in ORDER BY .

If you do not want to repeat the code in two places (SELECT and WHERE), you can use the CTE or select this result LastActivityOn - and the entire subquery - in the LastActivityOn , and then use it at the external level:

 SELECT TOP 175 LastActivityOn, FirstName, LastName, ... FROM ( SELECT ( SELECT MAX(ActivityDate) FROM ( VALUES (UserRegistration.CreatedOn), (UserRegistration.ActivatedOn), (UserRegistration.LastLoginOn), (UserRegistration.UpdatedOn), (UserProfile.LastPostedOn) ) AS AllDates(ActivityDate) ) LastActivityOn, UserRegistration.FirstName, UserRegistration.LastName, [15 more columns of various calculated distances, coalesces, etc...] FROM UserRegistration INNER JOIN UserProfile ON UserRegistration.Id = UserProfile.RegistrationId INNER JOIN ( SELECT PostalCode, GeoCenter, PrimaryCity, StateOrProvince FROM PostalCodes WHERE @OriginPostalCode IS NULL OR PostalCodes.GeoCenter.STDistance(@OriginPoint) < @WithinMeters ) AS ProximalPostalCodes ON ProximalPostalCodes.PostalCode = UserRegistration.PostalCode [7 more joins including full-text queries] WHERE [20 or more lines of filtering logic] ) AS tmp WHERE LastActivityOn > @OldestUserToSearch AND [any of the 20 lines that has "LastActivityO"] ORDER BY LOG(DATEDIFF(WEEK, LastActivityOn, @Today))/LOG(2), FullTextRelevance ; 

SQL-Server is likely to be smart enough and won't execute the same code twice, but it may depend on the version you are running on. The optimizer has progressed a lot from 2000 to 2012 (and Express or other versions may not have the same capabilities as Standard or Enterprise Edition).


It is not relevant to the question, but I think that since the LOG() function is monotonic, then:

 ORDER BY LOG(DATEDIFF(WEEK, LastActivityOn, @Today))/LOG(2) 

equivalent to simpler:

 ORDER BY DATEDIFF(WEEK, LastActivityOn, @Today)) 
+2
source

All Articles