Using a schema to rearrange the elements of an XML document to match the schema

Let's say I have an XML document (represented as text, a W3C DOM, etc.), as well as an XML schema. An XML document has all the correct elements defined by the schema, but in the wrong order.

How to use a scheme to "reorder" document elements in accordance with the order defined by the scheme?

I know this should be possible, possibly using XSOM , because the JAXB XJC code generator annotates its generated classes with the correct ordering of the elements.

However, I am not familiar with the XSOM API, and it is pretty tight, so I hope you have a lot of experience and can point me in the right direction. Something like "which children are allowed inside this parent and in what order?"


Let me give you an example.

I have an XML document like this:

<A> <Y/> <X/> </A> 

I have an XML schema that says that the contents of <A> should be <X> , followed by <Y> . Now it’s clear that if I try to check the document for a diagram, it fails because <X> and <Y> are in the wrong order. But I know that my document is "wrong" in advance, so I still do not use the scheme for verification. However, I know that my document has all the correct elements defined by the schema in the wrong order.

What I want to do is programmatically examine the Schema (possibly using the XSOM object model for the XML Schema) and ask it what the contents of the <A> should be. The API will provide information that "you will need <X> and then <Y> ".

So, I take my XML document (using the DOM API) and rebuild and, accordingly, so now the document will be checked for compliance with the scheme.

It is important to understand what XSOM is - it is a java API that represents the information contained in the XML schema, and not the information contained in my copy of the document.

What I do not want to do is generate code from the circuit, since the circuit is unknown at build time. In addition, XSLT is useless because the correct order of elements is determined solely by the data dictionary contained in the schema.

Hope that is now fairly explicit.

+5
java xml xsd xsom
source share
4 answers

Your problem translates as follows: you have an XSM file that does not match the schema, and you want to convert it to something real.

With XSOM, you can read the structure in XSD and possibly parse the XML, but you still need additional mapping from an invalid form to a valid form. Using a stylesheet would be a lot easier, because you could go through XML using XPath nodes to process the elements in the correct order. With XML, where you want to use apples before pears, the stylesheet first copies the apple node (/ Fruit / Apple) before copying the pear node. Thus, regardless of the order in the old file, they will be in the correct order in the new file.

What you can do with XSOM is to read the XSD and generate a stylesheet that will reorder the data. Then convert the XML using this stylesheet. once XSOM has generated a stylesheet for XSD, you can simply reuse the stylesheet until the XSD is changed or another XSD is needed.

Of course, you can use XSOM to immediately copy nodes in the correct order. But since this means that your code must go through all the nodes and child nodes itself, it may take some time to complete the process. The stylesheet will do the same, but the transformer will be able to handle all this faster. It can work directly with data, while Java code must get / set each node through the XMLDocument properties.


So, I would use XSOM to create a stylesheet for XSD that would just copy the XML node from node for reuse over and over again. The stylesheet would need to be rewritten only after changing the XSD, and it would work faster than when the Java API had to go through the nodes themselves. A stylesheet does not care about order, so it will always be in the correct order.
To make it more interesting, you can simply skip XSOM and try working with a style that XSD reads to create another stylesheet from This. This generated stylesheet will copy the XML nodes in the exact order as defined in the stylesheet. Would it be difficult? In fact, the stylesheet will have to generate templates for each element and ensure that the children in this element are processed in the correct order.

When I think about it, I wonder if this has been done before. It would be very general and could handle almost all of XSD / XML.

Let's see ... Using "// xsd: element / @ name", you would get all the element names in the schema. Each unique name must be translated into a template. Inside these templates, you need to handle the child nodes of a specific element, which is a little harder to get. Elements may have a link that you will need to execute. Otherwise, we get all the child xsd: its nodes.

+2
source share

I do not have a good answer yet, but I have to note that there is a possibility of ambiguity. Consider this scheme:

 <xs:element name="root"> <xs:choice> <xs:sequence> <xs:element name="foo"/> <xs:element name="bar"> <xs:element name="dee"> <xs:element name="dum"> </xs:element> </xs:sequence> <xs:sequence> <xs:element name="bar"> <xs:element name="dum"> <xs:element name="dee"> </xs:element> <xs:element name="foo"/> </xs:sequence> </xs:choice> </xs:element> 

and this XML input:

 <root> <foo/> <bar> <dum/> <dee/> </bar> </root> 

This can be done to match the pattern, either by reordering <foo> and <bar> , or by reordering <dee> and <dum> . There seems to be no reason to prefer each other.

+3
source share

I was stuck with the same problem for about two weeks. Finally I got a breakthrough. This can be achieved using the JAXB function for marshalling / untying.

In JAXB marshal / unmarshal, XML validation is an optional function. Therefore, when creating the Marshaller and UnMarshaller objects, we do not call the setSchema (schema) method. Omitting this step avoids the ability to validate XML Marshal / Nemarshal.

So now

  • If any required element in the XSD is not present in XML, it is skipped.
  • If any tag that is not present in XSD is present in XML, an error does not occur, and it is absent in the new XML received after sorting / unsetting.
  • If items are not in sequence, they are reordered. This is done by the JAXB generated POJOs that we pass when creating the JAXBContext.
  • If an element is tagged inside some other tag, it is omitted in the new XML. An error does not occur when sorting / disassembling.

 public class JAXBSequenceUtil { public static void main(String[] args) throws JAXBException, IOException { String xml = FileUtils.readFileToString(new File( "./conf/out/Response_103_1015700001&^&IOF.xml")); System.out.println("Before marshalling : \n" + xml); String sequencedXml = correctSequence(xml, "org.acord.standards.life._2"); System.out.println("After marshalling : \n" + sequencedXml); } /** * @param xml * - XML string to be corrected for sequence. * @param jaxbPackage * - package containing JAXB generated classes using XSD. * @return String - xml with corrected sequence * @throws JAXBException */ public static String correctSequence(String xml, String jaxbPackage) throws JAXBException { JAXBContext jaxbContext = JAXBContext.newInstance(jaxbPackage); Unmarshaller unmarshaller = jaxbContext.createUnmarshaller(); Object txLifeType = unmarshaller.unmarshal(new InputSource( new StringReader(xml))); System.out.println(txLifeType); StringWriter stringWriter = new StringWriter(); Marshaller marshaller = jaxbContext.createMarshaller(); marshaller.marshal(txLifeType, stringWriter); return stringWriter.toString(); } } 
+3
source share

Basically, you want to take the root element and from there recursively look at the child elements of the document and the child elements defined in the scheme and make the order consistent.

I will give you a C # -syntax solution, as this is what I code day and night, it is pretty close to Java. Please note that I will need to guess about XSOM, since I do not know its API. I also created Dom XML methods, as your C # with support would not help :)

// Assume the first call is SortChildrenIntoNewDocument (sourceDom.DocumentElement, targetDom.DocumentElement, schema.RootElement)

 public void SortChildrenIntoNewDocument( XmlElement source, XmlElement target, SchemaElement schemaElement ) { // whatever method you use to ask the XSOM to tell you the correct contents SchemaElement[] orderedChildren = schemaElement.GetChildren(); for( int i = 0; i < orderedChildren.Length; i++ ) { XmlElement sourceChild = source.SelectChildByName( orderedChildren[ i ].Name ); XmlElement targetChild = target.AddChild( sourceChild ) // recursive-call SortChildrenIntoNewDocument( sourceChild, targetChild, orderedChildren[ i ] ); } } 

I would not recommend a recursive method if it is a deep tree, in which case you will have to create objects like "tree walker". The advantage of this approach is that you can handle more complex things, for example, when the diagram indicates that you can have 0 or more elements that you can continue to process the source nodes until there is more of this match, and then move circuit walker from there.

+1
source share

All Articles