Handling Empty Nodes with the Java DOM

Question

Handling Empty Nodes with the Java DOM

I have a question about XML, using Java DOM and empty nodes. I am currently working on a project in which I take an XML descriptor file for abstract machines (for parsing text) and parse a number of input lines with them. The actual construction and interpretation of these abstract machines is performed and works fine, but I came across a rather interesting XML requirement. In particular, I should be able to turn the empty string of the InputString node into the empty string ("") and still execute my routines. The problem, however, occurs when I try to retrieve this empty node from my XML tree. This throws a null pointer exception, and then, as a rule, bad things begin to occur. Here is an abusive XML snippet (note that the first element is empty):

<InputStringList> <InputString></InputString> <InputString>000</InputString> <InputString>111</InputString> <InputString>01001</InputString> <InputString>1011011</InputString> <InputString>1011000</InputString> <InputString>01010</InputString> <InputString>1010101110</InputString> </InputStringList>

I extract the lines from the list using:

 //Get input strings to be validated xmlElement = (Element)xmlMachine.getElementsByTagName(XML_INPUT_STRING_LIST).item(0); xmlNodeList = xmlElement.getElementsByTagName(XML_INPUT_STRING); for (int j = 0; j < xmlNodeList.getLength(); j++) { //Add input string to list if (xmlNodeList.item(j).getFirstChild().getNodeValue() != null) { arrInputStrings.add(xmlNodeList.item(j).getFirstChild().getNodeValue()); } else { arrInputStrings.add(""); } }

How should I handle this empty case? I found a lot of information about deleting empty text nodes, but I still have to parse empty nodes as empty strings. Ideally, I would like to avoid using a special character to indicate an empty string.

Thank you in advance for your time.

+4

java dom xml parsing

phobos51594 Oct 24 '10 at 22:08

source share

2 answers

You can use a library like jOOX to simplify the standard DOM manipulation. With jOOX you get a list of strings per se:

 List<String> strings = $(xmlMachine).find(XML_INPUT_STRING_LIST) .find(XML_INPUT_STRING) .texts();

+1

Lukas Eder Jul 12 '12 at 13:38

source share

bobince · Accepted Answer · 2010-10-24T22:37:15+0000

 if (xmlNodeList.item(j).getFirstChild().getNodeValue() != null) {

nodeValue must not be null; it would be the very firstChild , which can be null and should be checked for:

 Node firstChild= xmlNodeList.item(j).getFirstChild(); arrInputStrings.add(firstChild==null? "" : firstChild.getNodeValue());

However, note that this is still sensitive to the fact that the content contains only one node text. If you have an element with another element or some text and a CDATA section, just getting the value of the first child is not enough to read all the text.

What you really want is the textContent property of the DOM Level 3 Core, which will give you all the text inside the element, however it is contained.

 arrInputStrings.add(xmlNodeList.item(j).getTextContent());

It is available in Java 1.5 onwards.

Handling Empty Nodes with the Java DOM

More articles: