Storing XML data in a database - many tables and xml dumping in a column

I want to save the xml that I get in my java web service. Reports will run every 5 minutes to pull out some data in xml elements.

I thought of two approaches to solving this problem.

  • Create multiple tables in the database to capture xml data. Basically each item will have its own column in the database.

  • Dump all xml in a column that can store XML data. For the purpose of the report, analyze the value in the query itself.

Which of the above approaches is better, especially in terms of performance? This is very important because reports will be generated at a very high frequency (every 5 minutes).

The xml schema is quite complex, not simple.

+7
source share
6 answers

If you write data once and request it many times, it will almost certainly be more efficient to parse the XML document once, save the data in the correct relational schema, and query the relational schema. XML parsing is not cheap, so the overhead of parsing potentially multiple XML documents every 5 minutes can be significant.

Of course, like all performance issues, your mileage may vary, so you can check it. If you use Oracle 11.2 and you store the data as binary XML (in this case it is stored after parsing) and you create the corresponding XMLIndexes in the XMLTypes that you store, the performance limit when exiting data in an XML document can be quite small. It should still be slower than the correct relational structure, but the difference may not be relevant to you.

Personally, I would prefer the relational storage approach to ignore performance issues altogether, as it makes it easier for others to interact with the data. There are many more developers who can write decent SQL than write worthy XPath expressions, and there are many more query tools that can generate reports from relational tables than they can generate reports from XML stored in a database.

+8
source

Maximus, It really depends on what you want to do with the XML data.

When I use XML for management purposes, for example, to customize the display of a page, I will store all XML in a single BLOB field. It is fast and extremely simple. This is a simple save and load procedure. You can easily view the XML in the BLOB field and edit it.

If you need to search or report values ​​within XML, for example, how many clients a particular attribute has, you probably want to parse individual attributes. This usually means that you have to do pre-processing and post-processing, but it allows you to quickly jump to individual attributes.

+5
source

Adhoc access

If you need to run efficient queries on the data contained in XML in adhoc or arbitrarily, you should parse it on Tables and Columns , which can be logically indexed and combined.

Limited access

If you just save the data and provide it based on some other criteria, such as a unique identifier or other key, and the XML is essentially opaque to the BLOB , then just store it in the BLOB column and with it.

Hybrid model

You might need a cross between where the XML is stored in the BLOB , and only the corresponding bits are stored in the Tables and Columns so that you can efficiently search for the XML payload.

+4
source

Without knowing a little more, it’s hard to say for sure, but most likely you are missing one important part that can make life much easier.

  • Bind XML to POJOs (JAXB, MOXy, or JibX)
  • Store as normalized columns from POJO (use jDBI, Hibernate or even simple JDBC templates)

In addition, depending on what kind of reports you are creating, perhaps consider just storing the data in memory - every 5 minutes do not sound like performance critical ones, but again persistence is not always necessary (or just for historical data or backups )

+1
source

If you need to save and query more than two XML documents, you must use an XML database.

eXist nice, keep these xmls in a column or disaggregate them in many tables, this is a bad option, I think ..

+1
source

You can also check the XMLData column type that is in Sqlserver or Xml Type in Oracle http://msdn.microsoft.com/en-us/library/hh403385.aspx

You can create calculated columns in the xml data column for those xml fields that are requested the most, which will help in faster fetching. To get a specific value in a specific xpath, you just need to pass the xpath to sqlserver so that it returns the value to that xpath for you.

0
source

All Articles