Import huge XML data (> 1Gb) in SQL Server 2008 daily

I ran into a problem that I need to import huge XML (> 1Gb) daily into SQL Server 2008. Now I have a sample XML file and its XML schema. The XML schema is quite complex, which contains many custom simple types and an element with a complex type, for example:

<xs:element name="xxxx_url"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:anyURI"> <xs:attribute ref="target" use="optional"/> <xs:attribute ref="abc" use="optional"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> 

After importing, the WCF service will be implemented to retrieve data stored in SQL Sever, something like search, retrieval, etc. (read-only operations).

The steps I can think of are like:

  • To determine the object model according to the provided XSD (manually), the object model will be used for the WCF service to return values.
  • Define the database schema from the provided XSD (manually), according to estimates, the schema has about 20-30 tables.
  • Create an SSIS package to load XML daily into the database.
  • Create a WCF service that reads from the database, fills in the data in the object model defined in step 1, and returns the object to the service client.

The problem is that these steps involve a lot of manual work. I have to examine the XSD line by line and convert it to an object model and a mannualy database schema.

I did some research that there are some automation tools for converting XSD to classes, as well as converting XSD to database schema. But the classes that were converted from XSD using the tool are rather confusing, and the conversion to the schema is not performed because it does not match the MS dataset format.

I am wondering if there is a good solution to this problem to save a lot of manual work?

Any suggestion appreciated!

+7
c # xml sql-server xsd ssis
source share
5 answers

At some point you need to make a transformation. Whether you are doing this XML reading into objects or into data in tables. The work should be done once, and then you just need to start the resulting process. I see the following problems:

  • XML is very large.

  • You do not yet have an XSD mapping to your desired schema.

Mapping is the work you will need to do. I would think that this would be best if you can import XSD into a table and then import from this temporary table into the schema that you want to use. Working with an XML file will give you problems because of its size.

So my suggestion is to force / wash the XML import into something that will work with the table structure. Then write the stored procedure to โ€œimportโ€ data from these tables into your โ€œrealโ€ schema.

Pat o

+3
source share

try compiling xml to more than one file, because there are future problems when things like รฝรฟฦกฦฦˆรฏ may appear in the database due to loading errors

0
source share

In short, our solution will require some work - there is no quick fix.

For scalability, I would recommend a technology that allows you to transfer via XML (a-la SAX), rather than trying to download and transfer everything to RAM. There is no great value for converting XML to an object graph for SSIS purposes, so consider any of the following options:

  • Stream and copy the XML document using a custom script with multiple outputs, and then use the other SSIS components to convert the resulting data.
  • Bulk upload (stream) XML to an intermediate instance of SQL server, then query XML from there (not a great solution, but simpler than 1.) or
  • Half-stretch the document into smaller pieces, load these pieces into the intermediate area, then work with them separately using XSD transforms, etc. This opens the door to better parallelism.
0
source share

Do you have example data that you can publish with at least one full data value?

Also, do you have access to the source database used to create this XML data? XML is not designed for this size of data transfer - your task will be much easier with data in flat file formats for each table.

0
source share

SQL Server has built-in XML types - it can create tables from your schema .

will they help you?

0
source share

All Articles