XSLTProcessor xmlSAX2Characters: out of memory

I have a page on which a 500 MB xml file is downloaded and the file is analyzed using the xsl template. The parser works fine in my local environment. I am using WAMP.

On a web server.

Warning: DOMDocument :: load () [domdocument.load]: (null) xmlSAX2Characters: out of memory in / home / mydomain / public _html / xslt / largeFile.xml, line: 2031052 in / home / mydomain / public _html / xslt / parser_large.php on line 6

My code looks like this: line 6 loads the xml file

<?php
$xslDoc = new DOMDocument();
$xslDoc->load("template.xslt");

$xmlDoc = new DOMDocument();
$xmlDoc->load("largeFile.xml");

$proc = new XSLTProcessor();
$proc->importStylesheet($xslDoc);
echo $proc->transformToXML($xmlDoc);
?>

I tried to copy the php.ini file from the wamp installation to the folder where the above code is located. But it did not help. The memory limit in this php.ini file is memory_limit = 1000M

Any advice / experience on this matter would be greatly appreciated.

+4
source share
1 answer

Here is the sad truth. There are two main ways to work with XML, based on the DOM, where the entire XML file is present in memory immediately (with significant overhead to speed its movement) and SAX, on which the file passes through memory, but only a small part of it is present at any time time.

However, with the DOM, high memory consumption is pretty much normal.

Currently, XSLT generally allows you to create constructs that can access any part of the entire file at any time and therefore requires a DOM style. Some programming languages ​​have libraries that allow you to supply SAX input to the XSLT processor, but this necessarily implies restrictions on the use of the XSLT language or memory is not much better than the DOM. PHP has no way to do XSLT reading of SAX input, however.

This leaves us with alternatives to the DOM; there is one called SimpleXML. SimpleXML is a little difficult to use if your document has namespaces. The ancient benchmark seems to indicate that it is somewhat faster and probably also less wasteful of memory consumption than the DOM on large files.

And finally, I was in your place once in another programming language. The solution was to split the document into small ones based on simple rules. Each small document contained a heading copied from the entire document, one "detail" element and a footer, making its format valid for a large XML file schema. It was processed using XSLT (provided that processing of one part element is not considered in any other detailed element), and the outputs are combined. This works like a charm but does not execute in seconds.

So, here are your options. Choose one.

  • Parse and process XML with SAX .
  • Use SimpleXML and hope that it will allow you to slightly increase the files in the same memory.
  • Run an external XSLT processor and hope that it allows you to slightly increase the files in the same memory.
  • Split and combine XML using this method and apply XSLT to only small fragments. This method is applicable only with some schemes.
+5
source

All Articles