How to convert data stored in XML files to a relational database (MySQL)?

I have several XML files containing data for a research project in which I need to run some statistics. The amount of data is close to 100 GB.

The structure is not so complex (it can be compared, perhaps, with 10 tables in the relational model), and given the nature of the problem, this data will never be updated again, I only need this in a place where it is easy to run queries.

I read about XML databases and the ability to run queries in XPATH style, but I never used them, and it’s not very convenient for me. Having data in a relational database will be my preferred choice.

So, I'm looking for a way to hide data stored in XML in a relational database (think of a large .sql file, similar to the one that mysqldump generated, but will do something else). The ultimate goal is the ability to run SQL queries to crunch data.

After some research, I'm pretty sure I have to write this myself. But I believe that this is a common problem, and therefore there should be a tool that already does this.

So, do you know about any tool that converts XML data into a relational database?

PS1:

My idea would be something like (it might work differently, but just to make sure you understand my point):

  • Data structure analysis (based on XML or XSD themselves)
  • Create a relational database (tables, keys) based on this structure
  • Creating SQL statements to create a database
  • Generate SQL statements to create data fill

PS2:

I saw a few posts here in SO, but still I could not find a solution. Microsoft Xml Bulk Load "seems to be doing something in this direction, but I don't have MS SQL Server.

+6
source share
5 answers

Databases are not the only way to find data. I can highly recommend Apache Solr

Store your raw data in XML format and search using the Solr index

+2
source

It is easy to import XML files in the required format into the MySql database:

https://dev.mysql.com/doc/refman/5.6/en/load-xml.html

This means that you usually need to convert the XML data to this format. How you do this depends on the complexity of the conversion, which programming languages ​​you know, and if you want to use XSLT (which is most likely a good idea).

From your past answers, it seems like you know Python, so http://xmlsoft.org/XSLT/python.html may be right for you.

+2
source

Take a look at StAX instead of XSD for data analysis / extraction. It is stream-based and can handle huge XML files.

+2
source

If you feel comfortable with Perl, I'm very lucky with the XML::Twig module for processing really large XML files.

Basically, you only need to set up several branch handlers and import your data into MySQL using DBI / DBD::mysql .

There is a pretty good example at xmltwig.org .

+1
source

If you are comfortable using commercial products, you can look at the Data Wizard for MySQL using the SQL Maestro team.

This application is designed specifically for export and, of course, for importing data from / to MySQL databases. It also includes XML imports . You can download a 30-day trial to see if this is really what you are looking for.

I must admit that I have not used the MySQL product line yet, but I had a good user interface with their Firebird Maestro and SQLite Maestro products.

+1
source

All Articles