IndexOutOfBoundsException when handling empty CDATA with transformer

I want to extract certain nodes from a large XML file. This works well until wild CDATA appears without any content.

Exit:

ERROR: '' javax.xml.transform.TransformerException: java.lang.IndexOutOfBoundsException at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:732) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:336) at xml_test.XML_Test.extractXML2(XML_Test.java:698) at xml_test.XML_Test.main(XML_Test.java:811) Caused by: java.lang.IndexOutOfBoundsException at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.getTextCharacters(XMLStreamReaderImpl.java:1143) at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.handleCharacters(StAXStream2SAX.java:261) at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.bridge(StAXStream2SAX.java:171) at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.parse(StAXStream2SAX.java:120) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:674) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:723) ... 3 more --------- java.lang.IndexOutOfBoundsException at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.getTextCharacters(XMLStreamReaderImpl.java:1143) at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.handleCharacters(StAXStream2SAX.java:261) at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.bridge(StAXStream2SAX.java:171) at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.parse(StAXStream2SAX.java:120) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:674) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:723) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:336) at xml_test.XML_Test.extractXML2(XML_Test.java:698) at xml_test.XML_Test.main(XML_Test.java:811) 

The code:

 InputStream stream = new FileInputStream("C:\\myFile.xml"); XMLInputFactory factory = XMLInputFactory.newInstance(); XMLStreamReader reader = factory.createXMLStreamReader(stream); TransformerFactory tf = TransformerFactory.newInstance(); Transformer t = tf.newTransformer(); String extractPath = "/root"; String path = ""; while(reader.hasNext()) { reader.next(); if(reader.isStartElement()) { path += "/" + reader.getLocalName(); if(path.equals(extractPath)) { StringWriter writer = new StringWriter(); StAXSource src = new StAXSource(reader); StreamResult res = new StreamResult(writer); t.transform(src, res); // Exception thrown System.out.println(writer.toString()); path = path.substring(0, path.lastIndexOf("/")); } } else if(reader.isEndElement()) { path = path.substring(0, path.lastIndexOf("/")); } } 

XML that causes the error:

 <foo><![CDATA[]]></foo> 

Can I make Transformer just ignore this? Or will another implementation look like? I can not change the input XML!

+5
source share
1 answer

This is an Xerces implementation issue, check this out: https://issues.apache.org/jira/browse/XERCESJ-1033

It seems that empty CDATA should not exist, so the only tips I can give you are:

  • Changing the implementation of XML parsing
  • Remove the empty CDATA from the source files (replace " <![CDATA[]]> " with "")
    or put a space in CDATA, for example. <![CDATA[ ]]>

I am adding a few examples with a different implementation.

Jaxb

In Jaxb, you map your XML to POJOs in a simple way.

For example, if you have the following xml file in c: \ myFile.xml:

 <root> <foo><![CDATA[]]></foo> <foo><![CDATA[some data here]]></foo> </root> 

You may have the following POJOs:

 @XmlRootElement public class Root { @XmlElement(name="foo") privateList<Foo> foo; public List<Foo> getFooList() { return foo; } public void setFooList(List<Foo> fooList) { this.foo = fooList; } } @XmlType(name = "foo") public class Foo { @XmlValue private String content; @Override public String toString() { return content; } } 

And then parse the XML object in Object with the following snippet:

  public static void main(String[] args) { try { File file = new File("C:\\myFile.xml"); JAXBContext jaxbContext = JAXBContext.newInstance(Root.class); Unmarshaller jaxbUnmarshaller = jaxbContext.createUnmarshaller(); Root root = (Root) jaxbUnmarshaller.unmarshal(file); for (Foo foo : root.getFooList()) { System.out.println(String.format("Foo content: |%s|", foo)); } } catch (JAXBException e) { e.printStackTrace(); } } 

I tested this and did not cause errors.

+4
source

Source: https://habr.com/ru/post/1211422/


All Articles