Ant XSLT task with a set of files on which there is not enough memory / does not free memory

I have a large (1.9 GB) XML file with data that I want to insert into the MySQL database every month. For this, I made an Ant script.

The Ant XSLT task cannot process one file so large, so I have a task that uses xml_split (from xml-twig-tools) to split a 1.9 GB file into smaller xml files of about 4 MB in size.

Everything goes well.

I use the following Ant xml to run the XSLT task in all of these XML files:

<target name="xsltransform" depends="split" description="Transform XML to SQL..."> <xslt basedir="${import.dir}/" destdir="${import.dir}/sql/" style="${xsl.filename}" force="true"> <mapper type="glob" from="*.xml" to="*.sql" /> <factory name="net.sf.saxon.TransformerFactoryImpl"/> </xslt> </target> 

The problem is that as soon as it starts with the first XML file, I see that the "RES" memory in linux grows with every next XML file. Since it processes several (unrelated) xml files, I would suggest that it free up memory between translating each xml file. Well, this is not ... after two hundred 4MB xml files, java throws an exception from memory:

 BUILD FAILED /var/lib/hudson/jobs/EPDB_Rebuild_Monthly/workspace/trunk/buildfiles/buildMonthly.xml:67: java.lang.OutOfMemoryError: Java heap space at net.sf.saxon.tinytree.TinyTree.ensureNodeCapacity(Unknown Source) at net.sf.saxon.tinytree.TinyTree.addNode(Unknown Source) at net.sf.saxon.tinytree.TinyBuilder.startElement(Unknown Source) at net.sf.saxon.event.Stripper.startElement(Unknown Source) at net.sf.saxon.event.ReceivingContentHandler.startElement(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source) at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at net.sf.saxon.event.Sender.sendSAXSource(Unknown Source) at net.sf.saxon.event.Sender.send(Unknown Source) at net.sf.saxon.event.Sender.send(Unknown Source) at net.sf.saxon.Controller.transform(Unknown Source) at org.apache.tools.ant.taskdefs.optional.TraXLiaison.transform(TraXLiaison.java:194) at org.apache.tools.ant.taskdefs.XSLTProcess.process(XSLTProcess.java:812) at org.apache.tools.ant.taskdefs.XSLTProcess.execute(XSLTProcess.java:408) at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) at org.apache.tools.ant.Task.perform(Task.java:348) at org.apache.tools.ant.Target.execute(Target.java:390) at org.apache.tools.ant.Target.performTasks(Target.java:411) at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1360) at org.apache.tools.ant.Project.executeTarget(Project.java:1329) 

Is there something I can do to prevent the XSLT task from executing all of my memory? Or should I reconsider my approach?

+4
source share
1 answer

We all agree that this should let go of memory, but since it is not, you can try breaking the xslt task to separate calls. for example using ant contrib for task

 <for param="file"> <fileset dir="${import.dir}"/> <sequential> <xslt in="@{file}" destdir="${import.dir}/sql/" style="${xsl.filename}" force="true"> <mapper type="glob" from="*.xml" to="*.sql" /> <factory name="net.sf.saxon.TransformerFactoryImpl"/> </xslt> </sequential> </for> 

If this does not do the trick, then since you are using Saxon, you can directly call Saxon java classes in a forked JVM. eg.

 <java classname="net.sf.saxon.Transform" failonerror="true" fork="true"> <arg value="-s:${import.dir}" /> <arg value="-xsl:${xsl.filename}" /> <arg value="-o:${import.dir}/sql" /> </java> 

or you can try both

 <for param="file"> <fileset dir="${import.dir}"/> <sequential> <basename property="@{file}.base" file="@{file}" suffix="xml"/> <java classname="net.sf.saxon.Transform" failonerror="true" fork="true"> <arg value="-s:@{file}" /> <arg value="-xsl:${xsl.filename}" /> <arg value="-o:${import.dir}/sql/${@{file}.base}.sql" /> </java> </sequential> </for> 

and for bonus points you can try to speed things up a bit by doing it in parallel.

 <for param="file"> <fileset dir="${import.dir}"/> <parallel> <basename property="@{file}.base" file="@{file}" suffix="xml"/> <java classname="net.sf.saxon.Transform" failonerror="true" fork="true"> <arg value="-s:@{file}" /> <arg value="-xsl:${xsl.filename}" /> <arg value="-o:${import.dir}/sql/${@{file}.base}.sql" /> </java> </parallel> </for> 
+4
source

All Articles