I have a large (1.9 GB) XML file with data that I want to insert into the MySQL database every month. For this, I made an Ant script.
The Ant XSLT task cannot process one file so large, so I have a task that uses xml_split (from xml-twig-tools) to split a 1.9 GB file into smaller xml files of about 4 MB in size.
Everything goes well.
I use the following Ant xml to run the XSLT task in all of these XML files:
<target name="xsltransform" depends="split" description="Transform XML to SQL..."> <xslt basedir="${import.dir}/" destdir="${import.dir}/sql/" style="${xsl.filename}" force="true"> <mapper type="glob" from="*.xml" to="*.sql" /> <factory name="net.sf.saxon.TransformerFactoryImpl"/> </xslt> </target>
The problem is that as soon as it starts with the first XML file, I see that the "RES" memory in linux grows with every next XML file. Since it processes several (unrelated) xml files, I would suggest that it free up memory between translating each xml file. Well, this is not ... after two hundred 4MB xml files, java throws an exception from memory:
BUILD FAILED /var/lib/hudson/jobs/EPDB_Rebuild_Monthly/workspace/trunk/buildfiles/buildMonthly.xml:67: java.lang.OutOfMemoryError: Java heap space at net.sf.saxon.tinytree.TinyTree.ensureNodeCapacity(Unknown Source) at net.sf.saxon.tinytree.TinyTree.addNode(Unknown Source) at net.sf.saxon.tinytree.TinyBuilder.startElement(Unknown Source) at net.sf.saxon.event.Stripper.startElement(Unknown Source) at net.sf.saxon.event.ReceivingContentHandler.startElement(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source) at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at net.sf.saxon.event.Sender.sendSAXSource(Unknown Source) at net.sf.saxon.event.Sender.send(Unknown Source) at net.sf.saxon.event.Sender.send(Unknown Source) at net.sf.saxon.Controller.transform(Unknown Source) at org.apache.tools.ant.taskdefs.optional.TraXLiaison.transform(TraXLiaison.java:194) at org.apache.tools.ant.taskdefs.XSLTProcess.process(XSLTProcess.java:812) at org.apache.tools.ant.taskdefs.XSLTProcess.execute(XSLTProcess.java:408) at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) at org.apache.tools.ant.Task.perform(Task.java:348) at org.apache.tools.ant.Target.execute(Target.java:390) at org.apache.tools.ant.Target.performTasks(Target.java:411) at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1360) at org.apache.tools.ant.Project.executeTarget(Project.java:1329)
Is there something I can do to prevent the XSLT task from executing all of my memory? Or should I reconsider my approach?
source share