SQL Server performance with Key / Pair vs XML Field and XPath table

I have already seen a few questions on this topic, but I'm looking for some idea of โ€‹โ€‹the performance differences between the two methods.

For example, let's say I write a log of events that will log into the system with a vocabulary of key / value pairs for a particular event. I will write an entry to the Events table with the underlying data, but then I need a way to also associate additional key / value data. I will never know which keys or values โ€‹โ€‹will come, so any predefined enumeration table is out of the question.

The data from this event will be constantly broadcast, so insertion time is as important as request time.

When I request specific events, I will use some fields in the event, as well as data from key / value data. For the XML method, I would simply use the Attributes.exists ('xpath') operator as part of the where clause to filter records.

The normalized way is to use a table with the Key and Value fields with an external link to the Event record. It seems clean and simple, but I'm worried about the amount of data that is involved.

+6
performance database xml sql-server xpath
source share
3 answers

The problem, I think the key / value table approach is for data types - if the value can be datetime, or a Unicode string or string or an integer, how do you define the column? This dilemma means that a value column must be a data type that can contain all of the various data types, which then ask about the efficiency / simplicity of queries. In addition, you have several columns of certain data types, but I think this is a little awkward.

For a truly flexible schema, I cannot come up with a more convenient option than XML. You can index XML columns.

This MSDN article discusses XML storage in more detail.

+2
source share

You have three main options for a flexible storage engine.

  • The XML fields are flexible, but put you in the blob memory area, which is slowly requesting. I saw queries against small datasets of 30,000 rows, taking 5 minutes when he was digging material from blocks with Xpath requests. This is the slowest option, but it is flexible.

  • A key / value pair is much faster, especially if you put a clustered index in an event key. This means that all attributes for a single event will be physically stored together in the database, which minimizes I / O. This approach is less flexible than XML, but significantly faster. The most effective queries for reporting against it include data rotation (i.e., scanning the table to obtain an intermediate smoothing result); joining individual fields will be much slower.

  • The fastest approach is to have a flat table with a set of custom fields (Field1 - Field50) and contain some metadata about the contents of the fields. This is the fastest way to insert, faster and easier to query, but the contents of the table are opaque to everything that does not have access to metadata.

+5
source share

I would suggest that the normalized way would be faster for INSERT and SELECT operations, if only because it would be optimized for any RDBMS. "Part of the data volume" can also be a problem, but more solvable - how long do you need this data at hand, can you archive it in a day or a couple of weeks or 3 months, etc.? SQL Server can handle a lot.

The data from this event will be constantly broadcast, so insertion time is as important as request time.

Option 3. If you really have a lot of streaming data, create a separate queue in shared memory, during sqlite processing, a separate db table or even your own server to save incoming raw events and attributes and have a different process (scheduled task, Windows service, etc. .d.) analyze this queue in any preferred format configured for fast SELECT. Optimal input, optimal output, ready to scale in any direction, everyone is happy.

+1
source share

All Articles