In my application, I modify part of the XML files that start as follows:
<?xml version="1.0" encoding="UTF-8"?> <myElement> ...
Note the empty line before <myElement> . After loading, changing and saving, the result does not suit:
<?xml version="1.0" encoding="UTF-8"?> <myElement> ...
I found out that the space (one new line) between the comment and the node document is not represented at all in the DOM. The following stand-alone code faithfully reproduces the problem:
String source = "<?xml version=\"1.0\" encoding=\"UTF-16\"?>\n\n<empty/>"; byte[] sourceBytes = source.getBytes("UTF-16"); DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder(); Document doc = builder.parse(new ByteInputStream(sourceBytes, sourceBytes.length)); DOMImplementationLS domImplementation = (DOMImplementationLS) doc.getImplementation(); LSSerializer lsSerializer = domImplementation.createLSSerializer(); System.out.println(lsSerializer.writeToString(doc)); // output: <?xml version="1.0" encoding="UTF-16"?>\n<empty/>
Does anyone have an idea how to avoid this? Essentially, I want the result to be the same as the input. (I know that the xml declaration will be restored because it is not part of the DOM, but that is not a problem.)
java dom xml parsing whitespace
Jens bannmann
source share