XML validation for multiple arbitrary schemas

Consider an XML document that starts as follows with several schemas (this is NOT a Spring-specific question, this is just a handy XML document for example):

<beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:jaxrs="http://cxf.apache.org/jaxrs" xmlns:osgi="http://www.springframework.org/schema/osgi" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.0.xsd http://cxf.apache.org/jaxrs http://cxf.apache.org/schemas/jaxrs.xsd http://www.springframework.org/schema/osgi http://www.springframework.org/schema/osgi/spring-osgi.xsd"> 

I want to check the document, but I don’t know in advance which namespaces the document author will use. I trust the author of the document, so I am ready to download arbitrary URLs to the scheme. How to implement my validator?

I know that I can specify my schemas with an instance of DocumentBuilderFactory for my setAttribute("http://java.sun.com/xml/jaxp/properties/schemaSource", new String[] {...}) call setAttribute("http://java.sun.com/xml/jaxp/properties/schemaSource", new String[] {...}) , but I I don’t know the URL of the schema until the document is parsed.

Of course, I was able to extract the XSD URL myself after parsing the document, and then run it through the validator, specifying "http://java.sun.com/xml/jaxp/properties/schemaSource" as mentioned above, but of course already exists an implementation that does this automatically?

+4
source share
3 answers

Forgive me to answer my own question ... Other answers from @Eugene Yokota and forty-two were very helpful, but I thought they were not complete enough to agree. I needed to do extra work to draft the final decision below. JDK 1.6 works fine. It does not have sufficient error checking (see the link in Eugene’s answer, which is a very complete solution, but it cannot be reused) and does not cache downloaded XSDs, I believe. I think it uses the specific features of the Xerces parser because of the apache.org function urls.

  InputStream xmlStream = ... DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setNamespaceAware(true); factory.setValidating(true); factory.setXIncludeAware(true); factory.setAttribute("http://java.sun.com/xml/jaxp/properties/schemaLanguage", "http://www.w3.org/2001/XMLSchema"); factory.setFeature("http://apache.org/xml/features/validation/schema-full-checking", true); factory.setFeature("http://apache.org/xml/features/honour-all-schemaLocations", true); factory.setFeature("http://apache.org/xml/features/validate-annotations", true); factory.setFeature("http://apache.org/xml/features/generate-synthetic-annotations", true); DocumentBuilder builder = factory.newDocumentBuilder(); builder.setErrorHandler(new ErrorHandler() { public void warning(SAXParseException exception) throws SAXException { LOG.log(Level.WARNING, "parse warn: " + exception, exception); } public void error(SAXParseException exception) throws SAXException { LOG.log(Level.SEVERE, "parse error: " + exception, exception); } public void fatalError(SAXParseException exception) throws SAXException { LOG.log(Level.SEVERE, "parse fatal: " + exception, exception); } }); Document doc = builder.parse(xmlStream); 
+4
source

I have not confirmed this, but you can find Use the JAXP validation API to create a validator and validate input from the DOM, which contains built-in schemas and several validation roots are useful.

In particular,

 factory.setFeature(SCHEMA_FULL_CHECKING_FEATURE_ID, schemaFullChecking); factory.setFeature(HONOUR_ALL_SCHEMA_LOCATIONS_ID, honourAllSchemaLocations); 
+3
source

If you create a DocumentBuilderFactory as follows:

  DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); dbf.setValidating(true); dbf.setNamespaceAware(true); dbf.setAttribute( "http://java.sun.com/xml/jaxp/properties/schemaLanguage", "http://www.w3.org/2001/XMLSchema"); 

You can then set the EntityResolver in the DocumentBuilder instances created by this factory to be able to resolve the location of the schemas specified in the directives. The specified location will be present in the argument to systemId .

I thought the builder would do this automatically without specifying a recognizer, but obviously not out of the box. Maybe this is controlled by another function, attribute or property?

+2
source

All Articles