Determining whether to store XML data as XML or in normalized tables

(This post is about personal experience with XML storage, please share what you know. :-))

I am working on a service application that communicates with an external service using XML. I plan to use SQL Server 2008 to store XML that is received and sent to an external service. I am exploring my options for storing XML in a database. The three parameters that I have identified are as follows:

  • Store XML in an XML Data Type Column
  • Create tables to store various parent and child relationships represented in XML.
  • A hybrid of the two above approaches, where the source XML is stored in a column of the XML data type, but several fields from XML are split into their own columns to simplify querying and indexing.

I’m looking for any advice based on your personal experience with storing and retrieving XML data in SQL Server.

Some additional background: I used the 'xsd.exe' equivalent called XsdObjectgenerator to create .net classes based on XML schemas. When a service receives an XML file, it is deserialized into an instance of the .net class. This instance is used to perform service operations. My initial plan was then to use option # 1 above for storing XML. If I needed to update or report data, I would just deserialize the db record back to one of my .net classes.

Despite the fact that this approach works and makes working with xml very simple, I have concerns that as the amount of data increases, the performance of queries against records of XML data type will decrease. This is why I considered options 2. and 3. above.

In addition to storing XML, XML will be queried for use in both reports and a separate web application. Db records will be queried, sorted, filtered, grouped, summarized and possibly updated by end users.

+7
xml sql-server
source share
3 answers

I think it depends on what you want to do with your XML in your database.

If you basically just save it and possibly retrieve it later in general and send it again, then I would definitely use the XML data type - there was no point in chopping it into bits and pieces.

If you need to mainly work with the contents of the XML file and possibly also manipulate and modify this content, then it would be advisable to create tables with columns in accordance with your XML content and destroy it when it is stored, use it, and when you need to , compile it from relational works using something like SELECT (columns) FROM dbo.Table FOR XML.....

There are overheads associated with shredding and reassembly, so you need to ask yourself if it's worth it. But there is also overhead if you need to manipulate the XML column too much.

If you only need read-only access to several attributes in your XML, I have come to understand the ability to wrap them in UDF and direct it as a computed column in your table. That way, you can easily select something from your table based on the values ​​that are stored somewhere inside your XML - pretty handy! But do not abuse this approach - it works great for 2, 3 attributes, but if you need to access your XML (and most or all of it) again and again, then you might be better off crossing it with relational parts to start with.

+5
source share

Continuing to explore solutions, the college redirected the following applicable links:

Some preliminary findings from these articles and other studies:

  • When working with the xml data type in SQL Server, flexible requesting large amounts of data will be slow since you are essentially requesting the blob data type.
  • While you can create indexes in columns of an XML data type in Sql Server, the index is in the entire column, not in a specific element or attribute, so indexes are not as efficient as an index in a db column without xml.
  • Storing xml in raw form in an xml field of the data type, while supporting the analyzed version of the data in either relational tables or a denormalized flat table for queries and reporting, is starting to come out as the most flexible solution. Xml can be "shredded" into query tables at run time or after a separate service or stream.

I will mock every solution with test data and do some benchmarking. I will post the results here when they become available.

+1
source share

A few tasks ago (SQL 2000), we stored XML as TEXT data, and our databases became significantly bloated - not so much with data, but with tags used to identify it. I did some testing, and pkzip (I said it was a few tasks back) crunched all the data to 3% of its original size.

Tip # 1: Determine how much time you need to store data, and if / when archiving old data is possible.

Tip # 2: If you are using SQL 2008, review the data compression options for the XML columns.

(It may not be relevant if your XML files are short, but ours were all in kbs and 10kbs.)

+1
source share

All Articles