JAXB validation, but spaces are not ignored

some code snippets.

java coding does jaxb unmarshaling. pretty simple, copied from online tutorials.

JAXBContext jc = JAXBContext.newInstance( "xmlreadtest" ); Unmarshaller u = jc.createUnmarshaller(); // setting up for validation. SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI); StreamSource schemaSource = new StreamSource(ReadXml.class.getResource("level.xsd").getFile()); Schema schema = schemaFactory.newSchema(schemaSource); u.setSchema(schema); // parsing the xml URL url = ReadXml.class.getResource("level.xml"); Source sourceRoot = (Source)u.unmarshal(url); 

Problem element from xml file. An element contains nothing but ignorant spaces. It is badly formed, as shown in the figure, how to find it in a file.

 <HashLine _id='FI6' ppLine='1' origLine='1' origFname='level.cpp'> </HashLine> 

The xsd element that described this element.

 <xs:element name="HashLine"> <xs:complexType> <xs:attribute name="origLine" type="xs:NMTOKEN" use="required" /> <xs:attribute name="origFname" type="xs:string" use="required" /> <xs:attribute name="_id" type="xs:ID" use="required" /> <xs:attribute name="ppLine" type="xs:NMTOKEN" use="required" /> </xs:complexType> </xs:element> 

error

 [org.xml.sax.SAXParseException: cvc-complex-type.2.1: Element 'HashLine' must have no character or element information item [children], because the type content type is empty.] 

I checked that the error comes from this element.

It loads perfectly without checking. But I need to use validation, as I will make big changes and additions to the application, and I have to be sure that everything will be properly marshaled / unarchalized.

It also works great if I modify complexType to include simpleContext with the xs: string extension. But I get this problem from entities around the world, of which there are alot, amd in alot xsd files. Therefore, it is impractical to base each element in XML documents on xs: string in order to get around this problem.

The event, although j2se 6 uses SchemaFactory from apache-xerces, it does not seem to accept the ignore-space function from xerces. (i.e. schemaFactory.setFeature ())

+4
source share
2 answers

You can use the StAX API to filter out empty character blocks before validation with EventFilter :

 class WhitespaceFilter implements EventFilter { @Override public boolean accept(XMLEvent event) { return !(event.isCharacters() && ((Characters) event) .isWhiteSpace()); } } 

This can be used to wrap your input:

 // strip unwanted whitespace XMLInputFactory inputFactory = XMLInputFactory.newInstance(); XMLEventReader eventReader = inputFactory .createXMLEventReader(ReadXml.class.getResourceAsStream("level.xml")); eventReader = inputFactory.createFilteredReader(eventReader, new WhitespaceFilter()); // parsing the xml Source sourceRoot = (Source) unmarshaller.unmarshal(eventReader); //TODO: proper error + stream handling 
+4
source

I would suggest writing a very simple XSLT transform to cut out empty content from those specific elements that are causing the problem (for example, only HashLine elements). Then set the processing step before passing data through JAXB, using TransformerFactory, Transformer, etc., which "cleans" the data using the XSLT transform. In XSLT, you can add sorting of the cleanup logic for cases where you find other friendly structures other than JAXB in the source XML.

+2
source

All Articles