I am exploring ways to improve application performance that I can only affect database level to a limited extent. SQL Server version is 2012 Service Pack 2 (SP2), and the table and view structure in question (I can not negatively affect this + note that an xml document can contain only a few hundred elements):
CREATE TABLE Orders( id nvarchar(64) NOT NULL, xmldoc xml NULL, CONSTRAINT PK_Order_id PRIMARY KEY CLUSTERED (id) ); CREATE VIEW V_Orders as SELECT a.id, a.xmldoc ,a.xmldoc.value('data(/row/c1)[1]', 'nvarchar(max)') "Stuff" ,a.xmldoc.value('data(/row/c2)[1]', 'nvarchar(max)') "OrderType" etc..... many columns from Orders a;
A typical query (and the one used for testing below):
SELECT id FROM V_Orders WHERE OrderType = '30791'
All queries are executed against the view, and I can neither influence the queries nor the structure of the table / view.
I thought adding a custom XML index to the table would be my savior:
CREATE SELECTIVE XML INDEX I_Orders_OrderType ON Orders(xmldoc) FOR( pathOrderType = '/row/c2' as SQL [nvarchar](20) )
But even after updating the statistics, the execution plan looks strange. Failed to post pic as a new account, so the relevant details as text:
In fact, with my test data, the query does not even return any results, but whether it returns one or more does not matter. The query support execution time does take as long as it can be deduced from the execution plan and has thousands of views.
So my question is, why is the xml selective index optimizer misused? Or is something wrong with me? How to optimize this specific query performance with selective xml indexing (or perhaps with a saved column)?
Edit: I conducted additional testing with larger sample data (~ 274 thousand rows in a table with XML documents close to average production sizes), and compared the selective XML index with an advanced column. Results are from Profiler tracing, focusing on CPU usage and number of samples. The implementation plan for selective xml indexing is basically the same as described above.
Selective XML index and 274k lines (query execution above): CPU: 6454, read: 938521
After I updated the values in the search field to be unique (total records another 274k), I got the following results:
Selective XML index and 274k lines (query execution above): CPU: 10077, read: 1006466
Then, using an advanced (i.e. constant) separately indexed column and using it directly in the view: CPU: 0, read: 23
Selective XML index performance seems to be closer to full table scans than corresponding SQL column indexing. I read somewhere that using a schema for a table can help to throw TOP TOP step away from the execution plan (provided that we are looking for a non-repeating field), but I'm not sure if this is a real possibility in this case.