Sql Server: selective XML index not used efficiently

I am exploring ways to improve application performance that I can only affect database level to a limited extent. SQL Server version is 2012 Service Pack 2 (SP2), and the table and view structure in question (I can not negatively affect this + note that an xml document can contain only a few hundred elements):

CREATE TABLE Orders( id nvarchar(64) NOT NULL, xmldoc xml NULL, CONSTRAINT PK_Order_id PRIMARY KEY CLUSTERED (id) ); CREATE VIEW V_Orders as SELECT a.id, a.xmldoc ,a.xmldoc.value('data(/row/c1)[1]', 'nvarchar(max)') "Stuff" ,a.xmldoc.value('data(/row/c2)[1]', 'nvarchar(max)') "OrderType" etc..... many columns from Orders a; 

A typical query (and the one used for testing below):

 SELECT id FROM V_Orders WHERE OrderType = '30791' 

All queries are executed against the view, and I can neither influence the queries nor the structure of the table / view.

I thought adding a custom XML index to the table would be my savior:

 CREATE SELECTIVE XML INDEX I_Orders_OrderType ON Orders(xmldoc) FOR( pathOrderType = '/row/c2' as SQL [nvarchar](20) ) 

But even after updating the statistics, the execution plan looks strange. Failed to post pic as a new account, so the relevant details as text:

  • The clustered index is searched from selectiveXml (Cost: 2% of the total). The expected number of rows is 1, but the expected amount of runtime is 1269 (the number of rows in the table)
  • → Top N sort (Cost: 95% of the total)
  • -> Calculate scalar (cost 0)

  • Separate branch: scanning of the PK_Order_id cluster index (cost: 3% of the total). The expected number of lines 1269

  • → Merge with computer scalar results using nested loops (Left external join)
  • → Filter
  • → The end result (expected number of lines 1269)

In fact, with my test data, the query does not even return any results, but whether it returns one or more does not matter. The query support execution time does take as long as it can be deduced from the execution plan and has thousands of views.

So my question is, why is the xml selective index optimizer misused? Or is something wrong with me? How to optimize this specific query performance with selective xml indexing (or perhaps with a saved column)?

Edit: I conducted additional testing with larger sample data (~ 274 thousand rows in a table with XML documents close to average production sizes), and compared the selective XML index with an advanced column. Results are from Profiler tracing, focusing on CPU usage and number of samples. The implementation plan for selective xml indexing is basically the same as described above.

Selective XML index and 274k lines (query execution above): CPU: 6454, read: 938521

After I updated the values ​​in the search field to be unique (total records another 274k), I got the following results:

Selective XML index and 274k lines (query execution above): CPU: 10077, read: 1006466

Then, using an advanced (i.e. constant) separately indexed column and using it directly in the view: CPU: 0, read: 23

Selective XML index performance seems to be closer to full table scans than corresponding SQL column indexing. I read somewhere that using a schema for a table can help to throw TOP TOP step away from the execution plan (provided that we are looking for a non-repeating field), but I'm not sure if this is a real possibility in this case.

+5
source share
2 answers

The XML index of your choice is stored in the internal table with the primary key from Orders as the leading column for the cluster key for the internal table and the specified paths as sparse columns.

In terms of query, you probably look something like this:

enter image description here

You have a check across the Orders table with a search in the internal primary key table for each row in Orders. The final filter statement is responsible for checking the value of OrderType , which returns only the corresponding rows.

Not what you expect from what is called an index.

A secondary XML selective index comes to the rescue. They are created for one of the paths specified in the primary sample index, and will create a nonclustered key for the values ​​retrieved in the path expression.

However, this is not so simple. SQL Server will not use the secondary index for the predicates used for values ​​retrieved by the values() function. Instead, you should use exists() . In addition, exists() requires the use of XQUERY data types in path expressions, where value() uses SQL data types.

Your primary custom XML index might look like this:

 CREATE SELECTIVE XML INDEX I_Orders_OrderType ON Orders(xmldoc) FOR ( pathOrderType = '/row/c2' as sql nvarchar(20), pathOrderTypeX = '/row/c2/text()' as xquery 'xs:string' maxlength (20) ) 

With a secondary value of pathOrderTypeX .

 CREATE XML INDEX I_Orders_OrderType2 ON Orders(xmldoc) USING XML INDEX I_Orders_OrderType FOR (pathOrderTypeX) 

And with a request that uses exist() , you get this plan.

 select id from V_Orders where xmldoc.exist('/row/c2/text()[. = "30791"]') = 1 

enter image description here

The first search is to search for the value that you are looking for in the nonclustered index of the internal table. Key searches are performed on a clustered key in the internal table (I don’t know why this is necessary). And the last search is the primary key in the Orders table, followed by a filter that checks for null values ​​in the xmldoc column.

If you manage to use property promotion by creating the computed indexed columns in the Orders table from XML, I think you will still get better performance than using secondary selective XML indexes.

+4
source

I followed this example and, strangely enough, I could not get the execution plan to show that it affects any selective or secondary index on this node.

0
source

All Articles