How effective is XPath compared to using DOM in Dom4J?

Question

How effective is XPath compared to using DOM in Dom4J?

For example, consider the following xml

<root> <childNode attribute1="value1"> <grandChildNode attrib1="val1" attrib2="val2">some content1 </grandChildNode> <grandChildNode attrib1="val1" attrib2="val2">some content2 </grandChildNode> <grandChildNode attrib1="val1" attrib2="val2">some content3 </grandChildNode> </childNode> <childNode attribute1="value1"> <grandChildNode attrib1="val1" attrib2="val2">some content1 </grandChildNode> <grandChildNode attrib1="val1" attrib2="val2">some content2 </grandChildNode> <grandChildNode attrib1="val1" attrib2="val2">some content3 </grandChildNode> </childNode> <childNode attribute1="value1"> <grandChildNode attrib1="val1" attrib2="val2">some content1 </grandChildNode> <grandChildNode attrib1="val1" attrib2="val2">some content2 </grandChildNode> <grandChildNode attrib1="val1" attrib2="val2">some content3 </grandChildNode> </childNode> </root>

Does it use the DOM to get the root of the node, and then loop through childNode and grandChildNode to be efficient or using XPath expressions to collect data that the nodes of child and grandChild are efficient?

+4

dom xml xpath dom4j

Ram Mar 31 '09 at 12:45

source share

1 answer

Jon cram · Accepted Answer · 2009-04-07T10:33:28+0000

If you want to fully process the XML document, XML parsing in the DOM will almost always be the least efficient in terms of deserialization time, CPU usage and memory usage.

Analysis in the DOM requires approximately 10-15 times the amount of memory, since an XML document requires disk space. For example, an XML document of 1 megabyte in size will be parsed in the DOM, occupying 10-15 megabytes of memory.

Just parse the DOM if you intend to modify some or all of the data, and then return the result to the XML document. For all other use cases, the DOM is a bad choice.

XPath is often significantly less resource intensive, but it depends on the length of the document (ie, the number of "childNode" elements) and the location in the document of the data you are interested in.

Memory usage and completion time XPath tends to increase further progress down the document. For example, let's say you have an XML document with 20,000 childNode elements, each childNode has a unique identifier that you know in advance, and you want to extract a known childNode from it. Retrieving the 18 345th childNode will use much more, much more memory, than retrieving the third.

So, if you use XPath to retrieve all the childNode elements, this may be less efficient than parsing in the DOM. XPath is usually an easy way to extract parts of an XML document. I would not recommend using it to process an entire XML document.

By far, the best approach if you really want to extract and process all the data in an XML document would be to use a reader based on SAX. It will be two orders of magnitude faster and fewer resources than any other approach.

However, it also depends on the amount of data you are dealing with. For the example XML document that you specified, you will not notice any practical differences. Yes, the DOM will be “slow” and SAX will be “fast,” but we are talking about swings in milliseconds or microseconds.

SAX can be hundreds or thousands of times faster than the DOM, however if you don't notice the difference between 2 microseconds and 2 milliseconds. When you are dealing with a document containing 20,000 childNode elements, 2 seconds versus 200 seconds will become a problem.

How effective is XPath compared to using DOM in Dom4J?

More articles: