Ignoring "Content Not Allowed in Trailing Section" SAXException

I am using Java DocumentBuilder.parse(InputStream) to parse an XML document. Sometimes I get garbled XML documents in that after the final > there is additional junk that causes a SAXException: Content is not allowed in trailing section . (In cases where I saw, trash is just one or more null bytes.)

I don’t care after the finale > . Is there an easy way to parse an entire XML document in Java and ignore any trash trash?

Note that "ignore" I just don't want to catch and ignore the exception: I mean ignore the final garbage, throw an exception and return a Document object, since XML is before final > .

+7
java xml exception sax
source share
2 answers

Since your sender provides you with invalid XML, you must correct it before it gets into the parser if you want to avoid this exception. If you cannot correct the sender, you will need a preprocessing step.

If the situation is simply that after the closing tag you received extra null bytes that were not allocated by one of the answers to another answer, perhaps this can be easily achieved by transferring your input stream to the FilterInputStream , which you implement to skip null bytes

If the problem is more complicated than just null characters, you will of course need a more sophisticated filter, which can be difficult.

If you use a ContentHandler , you can add a callback to it so that it can inform the calling code when the processed end tag is processed, and based on this knowledge, the calling code can have logic in its handler to exclude it simply bypassing it if the end has been signaled. At that moment, everything the parser was supposed to do was probably done anyway! But this solution does not seem to apply to your situation.

+8
source share

Not. A document containing trailing characters is not an XML document. Correct the sender.

-5
source share

All Articles