How to merge large XML files using MSXML SAX in Delphi

Edit: My (incomplete and very rude) translation of the XmlLite header is available on GitHub

What is the best way to easily combine massive XML documents in Delphi with MSXML without using the DOM? Should I use SAXReader and XMLWriter COM components and are there any good examples?

Conversion is a simple combination of all content elements from the root (Container) from a large number of large files (60 MB +) to one huge file (~ 1 GB).

<Container> <Contents /> <Contents /> <Contents /> </Container> 

It works for me in the following C # code using XmlWriter and XmlReaders, but this should happen in the Delphi native process:

 var files = new string[] { @"c:\bigFile1.xml", @"c:\bigFile2.xml", @"c:\bigFile3.xml", @"c:\bigFile4.xml", @"c:\bigFile5.xml", @"c:\bigFile6.xml" }; using (var writer = XmlWriter.Create(@"c:\HugeOutput.xml", new XmlWriterSettings{ Indent = true })) { writer.WriteStartElement("Container"); foreach (var inputFile in files) using (var reader = XmlReader.Create(inputFile)) { reader.MoveToContent(); while (reader.Read()) if (reader.IsStartElement("Contents")) writer.WriteNode(reader, true); } writer.WriteEndElement(); //End the Container element } 

We already use the MSXML DOM in other parts of the system, and I do not want to add new components, if possible.

+7
source share
4 answers

XmlLite is its own C ++ port for reading and writing XML files from System.Xml, which provides a parsing programming model. It is in a box with W2K3 SP2, WinXP SP3 and higher. You will need to convert the Delphi header before displaying almost 1-1 from C # to Delphi.

+3
source

I would just use normal file input / output to write to a text file, run each content as a string, and finally write. If you had a more reasonable size, I would put everything together in a string list and then transfer it to disk. But if you are in GB, it will be risky.

+1
source

libxml with Delphi shell Libxml2 may be an option (found here ), it has some SAX support and seems to be very strong - on the web page it is mentioned that libxml2 passed all 1800+ tests from the OASIS XML Test Suite. See Also: Is there a SAX Parser for Delphi and Free Pascal?

+1
source

Putting this as an answer because it needs some space and formatting.

I have one baaad data file for tests, see post https://github.com/the-Arioch/omnixml/commit/d1a544048e86921983fced67c772944f12cb1427

Here OmniXML seems to suck in debugging XE2:

  • Over 25% memory usage than TXmlDocument / MSXML. Maybe even more after fixing the .NextSibling problem, not re-checking.
  • longer file download time (OTOH reads node properties much faster: they are already Delphi-typed variables, without crossing the MSXML / Delphi border)
  • absolutely no namespace support, which makes tag recognition easier
  • XPath in embryo state, including again the lack of namespaces

https://docs.google.com/spreadsheets/d/1QcFVwh3fFfaDyRmv2b-n4Rq4_u5p42UfNbR_FZgZizY/edit?usp=sharing

0
source

All Articles