Combining two XML files in Java

Question

Combining two XML files in Java

I have two XML files of a similar structure that I want to merge into a single file. I am currently using the EL4J XML Merge that I came across in this tutorial. However, it does not merge, because I expect that for instances the main problem is that it does not merge both files into one element, one of which contains 1, 2, 3, and 4. Instead, it simply discards 1 or 2 or 3 and 4, depending on which file was combined first.

So, I would be grateful to everyone who has experience with XML Merge if they can tell me what I can do wrong, or, as an alternative, does anyone know of a good XML API for Java that could merge files into as necessary

Thanks so much for your help at Advance.

Edit:

Maybe indeed with some good suggestions on this, so generosity has been added. I tried the jdigital suggestion, but still have problems with XML merging.

The following is an example of the structure of the XML files I'm trying to combine.

<run xmloutputversion="1.02"> <info type="a" /> <debugging level="0" /> <host starttime="1237144741" endtime="1237144751"> <status state="up" reason="somereason"/> <something avalue="test" test="alpha" /> <target> <system name="computer" /> </target> <results> <result id="1"> <state value="test" /> <service value="gamma" /> </result> <result id="2"> <state value="test4" /> <service value="gamma4" /> </result> </results> <times something="0" /> </host> <runstats> <finished time="1237144751" timestr="Sun Mar 15 19:19:11 2009"/> <result total="0" /> </runstats> </run> <run xmloutputversion="1.02"> <info type="b" /> <debugging level="0" /> <host starttime="1237144741" endtime="1237144751"> <status state="down" reason="somereason"/> <something avalue="test" test="alpha" /> <target> <system name="computer" /> </target> <results> <result id="3"> <state value="testagain" /> <service value="gamma2" /> </result> <result id="4"> <state value="testagain4" /> <service value="gamma4" /> </result> </results> <times something="0" /> </host> <runstats> <finished time="1237144751" timestr="Sun Mar 15 19:19:11 2009"/> <result total="0" /> </runstats> </run>

Expected Result

 <run xmloutputversion="1.02"> <info type="a" /> <debugging level="0" /> <host starttime="1237144741" endtime="1237144751"> <status state="down" reason="somereason"/> <status state="up" reason="somereason"/> <something avalue="test" test="alpha" /> <target> <system name="computer" /> </target> <results> <result id="1"> <state value="test" /> <service value="gamma" /> </result> <result id="2"> <state value="test4" /> <service value="gamma4" /> </result> <result id="3"> <state value="testagain" /> <service value="gamma2" /> </result> <result id="4"> <state value="testagain4" /> <service value="gamma4" /> </result> </results> <times something="0" /> </host> <runstats> <finished time="1237144751" timestr="Sun Mar 15 19:19:11 2009"/> <result total="0" /> </runstats> </run>

+13

java xml api parsing

Mark davidson Mar 15 '09 at 20:19

source share

12 answers

Mcdowell · Answer 1 · 2009-03-30T20:11:22+0000

Not very elegant, but you can do it with the DOM and XPath parser:

 public class MergeXmlDemo { public static void main(String[] args) throws Exception { // proper error/exception handling omitted for brevity File file1 = new File("merge1.xml"); File file2 = new File("merge2.xml"); Document doc = merge("/run/host/results", file1, file2); print(doc); } private static Document merge(String expression, File... files) throws Exception { XPathFactory xPathFactory = XPathFactory.newInstance(); XPath xpath = xPathFactory.newXPath(); XPathExpression compiledExpression = xpath .compile(expression); return merge(compiledExpression, files); } private static Document merge(XPathExpression expression, File... files) throws Exception { DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory .newInstance(); docBuilderFactory .setIgnoringElementContentWhitespace(true); DocumentBuilder docBuilder = docBuilderFactory .newDocumentBuilder(); Document base = docBuilder.parse(files[0]); Node results = (Node) expression.evaluate(base, XPathConstants.NODE); if (results == null) { throw new IOException(files[0] + ": expression does not evaluate to node"); } for (int i = 1; i < files.length; i++) { Document merge = docBuilder.parse(files[i]); Node nextResults = (Node) expression.evaluate(merge, XPathConstants.NODE); while (nextResults.hasChildNodes()) { Node kid = nextResults.getFirstChild(); nextResults.removeChild(kid); kid = base.importNode(kid, true); results.appendChild(kid); } } return base; } private static void print(Document doc) throws Exception { TransformerFactory transformerFactory = TransformerFactory .newInstance(); Transformer transformer = transformerFactory .newTransformer(); DOMSource source = new DOMSource(doc); Result result = new StreamResult(System.out); transformer.transform(source, result); } }

This assumes that you can simultaneously hold at least two documents in RAM.

stwissel · Answer 2 · 2011-04-18T17:04:21+0000

I am using XSLT to combine XML files. This allows me to set up a merge operation to just merge content together or merge at a certain level. This is a bit more work (and XSLT syntax is special), but super flexible. A few things you need here.

a) Include an additional file b) Copy the source file 1: 1 c) Create your merge point with or without duplication prevention.

a) In the beginning I

 <xsl:param name="mDocName">yoursecondfile.xml</xsl:param> <xsl:variable name="mDoc" select="document($mDocName)" />

this allows you to specify a second file using $ mDoc

b) Instructions for copying the source tree 1: 1 are 2 templates:

 <!-- Copy everything including attributes as default action --> <xsl:template match="*"> <xsl:element name="{name()}"> <xsl:apply-templates select="@*" /> <xsl:apply-templates /> </xsl:element> </xsl:template> <xsl:template match="@*"> <xsl:attribute name="{name()}"><xsl:value-of select="." /></xsl:attribute> </xsl:template>

With nothing else, you are not getting a 1: 1 copy of your first source file. Works with any type of XML. The unifying part is file. Suppose you have event elements with an event identifier attribute. You do not want duplicate identifiers. The template will look like this:

  <xsl:template match="events"> <xsl:variable name="allEvents" select="descendant::*" /> <events> <!-- copies all events from the first file --> <xsl:apply-templates /> <!-- Merge the new events in. You need to adjust the select clause --> <xsl:for-each select="$mDoc/logbook/server/events/event"> <xsl:variable name="curID" select="@id" /> <xsl:if test="not ($allEvents[@id=$curID]/@id = $curID)"> <xsl:element name="event"> <xsl:apply-templates select="@*" /> <xsl:apply-templates /> </xsl:element> </xsl:if> </xsl:for-each> </properties> </xsl:template>

Of course, you can compare other things like tag names, etc. It is also up to you how deep the merger is. If you do not have a key for comparison, the design becomes simpler, for example. for the magazine:

  <xsl:template match="logs"> <xsl:element name="logs"> <xsl:apply-templates select="@*" /> <xsl:apply-templates /> <xsl:apply-templates select="$mDoc/logbook/server/logs/log" /> </xsl:element>

To run XSLT in Java, use this:

  Source xmlSource = new StreamSource(xmlFile); Source xsltSource = new StreamSource(xsltFile); Result xmlResult = new StreamResult(resultFile); TransformerFactory transFact = TransformerFactory.newInstance(); Transformer trans = transFact.newTransformer(xsltSource); // Load Parameters if we have any if (ParameterMap != null) { for (Entry<String, String> curParam : ParameterMap.entrySet()) { trans.setParameter(curParam.getKey(), curParam.getValue()); } } trans.transform(xmlSource, xmlResult);

or you download Saxon SAX Parser and do it from the command line (Linux shell example):

 #!/bin/bash notify-send -t 500 -u low -i gtk-dialog-info "Transforming $1 with $2 into $3 ..." # That actually the only relevant line below java -cp saxon9he.jar net.sf.saxon.Transform -t -s:$1 -xsl:$2 -o:$3 notify-send -t 1000 -u low -i gtk-dialog-info "Extraction into $3 done!"

Ymmv

Mark davidson · Answer 3 · 2009-06-14T16:47:54+0000

Thanks to everyone for their suggestions, unfortunately, none of the proposed methods turned out to be suitable at the end, since I needed to have rules for how the various nodes of the structure where they were assembled.

So, I made a DTD related to the XML files that I merged, and from them I created a number of classes that reflect the structure. From this, I used XStream to cancel the initialization of the XML file in the classes.

So I annotated my classes, creating a process for him to use a combination of rules assigned by annotations and some reflection to combine objects, rather than combine the actual XML structure.

If someone is interested in the code, which in this case combines the Nmap XML files, see http://fluxnetworks.co.uk/NmapXMLMerge.tar.gz the codes are not perfect, and I will admit that it is not flexible, but it definitely works. I plan to override the system when it automatically parses DTDs when I have free time.

tyler · Answer 4 · 2009-03-26T23:47:15+0000

This can help if you have been explicit about the results that you are interested in achieving. Is this what you are asking for?

Doc A:

 <root> <a/> <b> <c/> </b> </root>

Doc B:

 <root> <d/> </root>

Merging Results:

 <root> <a/> <b> <c/> </b> <d/> </root>

Are you worried about scaling large documents?

The easiest way to implement this in Java is to use an XML streaming parser (google for "java StAX"). If you use the javax.xml.stream library, you will find that XMLEventWriter has a convenient method XMLEventWriter # add (XMLEvent). All you have to do is loop over the top-level elements in each document and add them to your record using this method to create a combined result. The only funky part is the implementation of the reader logic, which takes into account (only "add") on top-level nodes.

I recently implemented this method if you need some hints.

zdenekhorak · Answer 5 · 2012-08-17T12:38:53+0000

Here's how it should look with XML Merge:

 action.default=MERGE xpath.info=/run/info action.info=PRESERVE xpath.result=/run/host/results/result action.result=MERGE matcher.result=ID

You must set the ID for // result node and set the PRESERVE action for // info node. Also be careful that .properties Using XML Merge is case sensitive - you should use "xpath" and not "XPath" in your .properties.

Remember to specify the -config parameter as follows:

 java -cp lib\xmlmerge-full.jar; ch.elca.el4j.services.xmlmerge.tool.XmlMergeTool -config xmlmerge.properties example1.xml example2.xml

jdigital · Answer 6 · 2009-03-15T22:52:54+0000

I looked at the link to the link; it is strange that XMLMerge is not working properly. Your example seems simple. Have you read the section called Using XPath Declarations with XmlMerge ? Using an example, try setting up XPath to get the results and setting it to merge. If I read the document correctly, it will look something like this:

 XPath.resultsNode=results action.resultsNode=MERGE

Andy white · Answer 7 · 2009-03-15T20:38:15+0000

Perhaps you can write a java application that deserts XML documents into objects, and then "combine" individual objects programmatically into a collection. You can then serialize the collection object back to the XML file with everything that has been merged.

JAXB There are several tools in the API that can convert an XML document / schema to Java classes. The xjc tool could do this, although I can't remember if you can create classes directly from an XML document, or if you need to create a schema first. There are tools that can generate a schema from an XML document.

Hope this helps ... not sure if this is what you were looking for.

Staxman · Answer 8 · 2009-03-27T02:54:27+0000

In addition to using Stax (which makes sense) with StaxMate will probably be easier ( http://staxmate.codehaus.org/Tutorial ). Just create 2 SMInputCursors and, if necessary, a child cursor. And then a typical merge sort with two cursors. Similar to moving DOM documents with recursive descents.

tyler · Answer 9 · 2009-03-31T16:40:15+0000

So, you are only interested in combining the elements of "results"? Is everything else ignored? The fact that input0 is of type <info type = "a" / "> and input1 has <info type =" b "/"> and the expected result is <info type = "a" / "> seems to suggest this is,

If you are not worried about scaling and want to quickly solve this problem, I would suggest writing a specific bit of code that uses a simple library such as JDOM to examine inputs and record the output result.

Trying to write a universal tool that was smart enough to handle all possible merge cases would be rather time-consuming — you would need to provide configuration options to define merge rules. If you know exactly what your data will look like, and you know exactly how the merge should be performed, I would suggest that your algorithm will execute each XML input and write to one XML output.

Ram · Answer 10 · 2009-04-01T06:32:44+0000

You can try Dom4J , which provides very good tools for extracting information using XPath queries, and also makes it easy to write XML, you just need to play a bit with the API to get the job done.

Vik Ermolenko · Answer 11 · 2018-08-24T11:07:39+0000

Sometimes you just need to concatenate XML files into one, for example, with a similar structure, for example:

xml1 file:

 <root> <level1> ... </level1> <!--many records--> <level1> ... </level1> </root>

xml2 file:

 <root> <level1> ... </level1> <!--many records--> <level1> ... </level1> </root>

In this case, the following procedure using the jdom2 library may help you:

 void concatXML(Path fSource,Path fDest) { Document jdomSource = null; Document jdomDest = null; List<Element> elems = new LinkedList<Element>(); SAXBuilder jdomBuilder = new SAXBuilder(); try { jdomSource = jdomBuilder.build(fSource.toFile()); jdomDest = jdomBuilder.build(fDest.toFile()); Element root = jdomDest.getRootElement(); root.detach(); String sourceNextElementName=((Element) jdomSource.getRootElement().getContent().get(1)).getName(); for (Element record:jdomSource.getRootElement().getDescendants(new ElementFilter(sourceNextElementName))) elems.add(record); for (Element elem : elems) (elem).detach(); root.addContent(elems); Document newDoc = new Document(root); XMLOutputter xmlOutput = new XMLOutputter(); xmlOutput.output(newDoc, System.out); xmlOutput.setFormat(Format.getPrettyFormat()); xmlOutput.output(newDoc, Files.newBufferedWriter(fDest, Charset.forName("UTF-8"))); } catch (Exception e) { e.printStackTrace(); } }

Neil coffey · Answer 12 · 2009-03-27T02:08:56+0000

Do you think that you just don’t “parse” XML parsing and simply treat the files as long, long lines and use boring old things like hash maps and regular expressions ...? This may be one of those cases where bizarre acronyms with X in them just make the work more confusing than it should be.

Obviously, this depends a little on how much data you really need to parse during the merge. But according to the sound of things, the answer to this is not much.

Combining two XML files in Java

More articles: