An effective way to get the maximum date before a specified date

Suppose I have a table called Transaction and another Price table. Price holds prices for the funds provided at different times. Each fund will have prices added on different dates, but they will not have prices at all possible dates. So for the XYZ fund, I can have prices for May 1, May 7 and May 13, and the ABC fund can have prices for May 3, May 9 and May 11.

So now I look at the price that prevailed for the fund at the date of the transaction. The deal was for the XYZ fund on May 10th. What I want is the best-known price of that day, which will be the price of May 7th.

Here is the code:

select d.TransactionID, d.FundCode, d.TransactionDate, v.OfferPrice from Transaction d inner join Price v on v.FundCode = d.FundCode and v.PriceDate = ( select max(PriceDate) from Price where FundCode = v.FundCode /* */ and PriceDate < d.TransactionDate ) 

It works, but it is very slow (few minutes in the real world). If I delete the line with the main comment, the request will be very fast (2 seconds or so), but then it will use the latest price for the fund, which is incorrect.

The bad part is that the price table is negligible compared to some of the other tables that we use, and I don’t understand why it is so slow. I suspect that the violation string forces SQL Server to process the Cartesian product, but I do not know how to avoid it.

I continue to hope to find a more effective way to do this, but so far it has eluded me. Any ideas?

+7
source share
3 answers

You do not specify the version of SQL Server that you are using, but if you are using a version that supports the CTE ranking and query functions, I think you will find it quite a bit more efficient than using the correlated subquery in your join statement.

It should be very similar in performance to Andriy's requests. Depending on the exact topography of the index of your tables, one approach may be slightly faster than the other.

I am inclined towards approaches based on CTE, because the resulting code is quite readable (in my opinion). Hope this helps!

 ;WITH set_gen (TransactionID, OfferPrice, Match_val) AS ( SELECT d.TransactionID, v.OfferPrice, ROW_NUMBER() OVER(PARTITION BY d.TransactionID ORDER BY v.PriceDate ASC) AS Match_val FROM Transaction d INNER JOIN Price v ON v.FundCode = d.FundCode WHERE v.PriceDate <= d.TransactionDate ) SELECT sg.TransactionID, d.FundCode, d.TransactionDate, sg.OfferPrice FROM Transaction d INNER JOIN set_gen sg ON d.TransactionID = sg.TransactionID WHERE sg.Match_val = 1 
+4
source

There is a method for finding strings with maximum or minimum values, which includes a LEFT JOIN for itself, and not a more intuitive, but probably more expensive, INNER JOIN for an aggregated list with self-healing.

Basically the method uses this template:

 SELECT t.* FROM t LEFT JOIN t AS t2 ON t.key = t2.key AND t2.Value > t.Value /* ">" is when getting maximums; "<" is for minimums */ WHERE t2.key IS NULL 

or its NOT EXISTS:

 SELECT * FROM t WHERE NOT EXISTS ( SELECT * FROM t AS t2 WHERE t.key = t2.key AND t2.Value > t.Value /* same as above applies to ">" here as well */ ) 

So, the result is all rows for which there is no row with the same key and value greater than the given.

When there is only one table, applying the above method is quite simple. However, it may not be so obvious how to apply it when there is another table, especially when, as in your case, another table makes a complex query more difficult not only because it exists, but also provides us with additional filtering for the values that we are looking for, namely with upper limits for dates.

So, here is what the resulting query might look like when using the LEFT JOIN version of the method:

 SELECT d.TransactionID, d.FundCode, d.TransactionDate, v.OfferPrice FROM Transaction d INNER JOIN Price v ON v.FundCode = d.FundCode LEFT JOIN Price v2 ON v2.FundCode = v.FundCode /* this and */ AND v2.PriceDate > v.PriceDate /* this are where we are applying the above method; */ AND v2.PriceDate < d.TransactionDate /* and this is where we are limiting the maximum value */ WHERE v2.FundCode IS NULL 

And here is a similar solution with NOT EXISTS:

 SELECT d.TransactionID, d.FundCode, d.TransactionDate, v.OfferPrice FROM Transaction d INNER JOIN Price v ON v.FundCode = d.FundCode WHERE NOT EXISTS ( SELECT * FROM Price v2 WHERE v2.FundCode = v.FundCode /* this and */ AND v2.PriceDate > v.PriceDate /* this are where we are applying the above method; */ AND v2.PriceDate < d.TransactionDate /* and this is where we are limiting the maximum value */ ) 
+5
source

Are pricedate and transactiondate indices pricedate ? If not, you are scanning the table, which is probably the cause of the performance bottleneck.

0
source

All Articles