JAXB - unmarshal OutOfMemory: Java Heap Space

I'm currently trying to use JAXB to untie an XML file, but it seems that the XML file is too large (~ 500 Mb) for the unmarshaller handler. I keep getting java.lang.OutOfMemoryError: Java heap space @

 Unmarshaller um = JAXBContext.newInstance("com.sample.xml"); Export e = (Export)um.unmarhsal(new File("SAMPLE.XML")); 

I assume this is because it is trying to open a large XML file as an object, but the file is too large for the java heap space.

Is there any other more “efficient way of memory” for parsing large ~ 500 MB XML files? Or perhaps a unmarshaller property that can help me process a large XML file?

Here my XML looks like

 <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <!-- --> <Export xmlns="wwww.foo.com" xmlns:xsi="www.foo1.com" xsi:schemaLocation="www.foo2.com/.xsd"> <!--- ---> <Origin ID="foooo" /> <!---- ----> <WorkSets> <WorkSet> <Work> ..... <Work> .... <Work> ..... </WorkSet> <WorkSet> .... </WorkSet> </WorkSets> 

I would like to unleash at the WorkSet level, while still having the ability to read all the work for each WorkSet.

+7
source share
6 answers

What does your XML look like? Generally, for large documents, I recommend that people use the StAX XMLStreamReader so that the document can be unarmalized by JAXB in pieces.

Input.xml

The document below has many instances of the person element. We can use JAXB with the StAX XMLStreamReader to untie the corresponding person objects one at a time to avoid running out of memory.

 <people> <person> <name>Jane Doe</name> <address> ... </address> </person> <person> <name>John Smith</name> <address> ... </address> </person> .... </people> 

Demo

 import java.io.*; import javax.xml.stream.*; import javax.xml.bind.*; public class Demo { public static void main(String[] args) throws Exception { XMLInputFactory xif = XMLInputFactory.newInstance(); XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("input.xml")); xsr.nextTag(); // Advance to statements element JAXBContext jc = JAXBContext.newInstance(Person.class); Unmarshaller unmarshaller = jc.createUnmarshaller(); while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) { Person person = (Person) unmarshaller.unmarshal(xsr); } } } 

Person

Instead of matching the root element of the XML document, we need to add @XmlRootElement annotations to the local root of the XML fragment from which we will disconnect.

 @XmlRootElement public class Person { } 
+9
source

You can increase the heap of space with the -Xmx startup argument.

For large files, SAX processing is more memory efficient since it is event driven and does not load the entire structure into memory.

+5
source

I have done a lot of research, in particular, that it is convenient to parse very large input sets. It is true that you can combine StaX and JaxB to selectively parse XML fragments, but this is not always possible or preferable. If you are interested in learning more about the topic, please see:

http://xml2java.net/documents/XMLParserTechnologyForProcessingHugeXMLfiles.pdf

In this document, I describe an alternative approach that is very simple and easy to use. It parses arbitrarily large input sets, giving you access to your data in javabeans mode.

+2
source

Use SAX or StAX . But if the goal is to have an object representation of the file in memory, you still need a lot of memory to store the contents of such a large file. In this case, your only hope is to increase the heap size using the -Xmx1024m JVM option (which sets the maximum heap size to 1024 MB).

+1
source

SAX, but you will have to create your own export object

0
source

You can try this too, it's kind of a bad practice but its working :) that takes care

http://amitsavm.blogspot.in/2015/02/partially-parsing-xml-using-jaxb-by.html

Another wise use of STAX or SAX or what Blaise Dohan says is also good, and you can tell the standard way, but if you have a complex XML structure and you don’t want to manually annotate your classes and use the XJC tool.

In this case, it may be useful.

0
source

All Articles