How to save newlines in CDATA when creating XML?

I want to write text containing white space characters, such as newline and tab , in an xml file, so I use

 Element element = xmldoc.createElement("TestElement"); element.appendChild(xmldoc.createCDATASection(somestring)); 

but when i read it again when using

 Node vs = xmldoc.getElementsByTagName("TestElement").item(0); String x = vs.getFirstChild().getNodeValue(); 

I get a row that no longer has rows. When I look directly in xml on disk, newline characters seem to be saved. therefore, the problem occurs when reading in an XML file.

How to save newlines?

Thanks!

+6
java xml w3c cdata newline
source share
5 answers

I donโ€™t know how you parse and write your document, but here is an example of an extended code based on yours:

 // creating the document in-memory Document xmldoc = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument(); Element element = xmldoc.createElement("TestElement"); xmldoc.appendChild(element); element.appendChild(xmldoc.createCDATASection("first line\nsecond line\n")); // serializing the xml to a string DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance(); DOMImplementationLS impl = (DOMImplementationLS)registry.getDOMImplementation("LS"); LSSerializer writer = impl.createLSSerializer(); String str = writer.writeToString(xmldoc); // printing the xml for verification of whitespace in cdata System.out.println("--- XML ---"); System.out.println(str); // de-serializing the xml from the string final Charset charset = Charset.forName("utf-16"); final ByteArrayInputStream input = new ByteArrayInputStream(str.getBytes(charset)); Document xmldoc2 = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(input); Node vs = xmldoc2.getElementsByTagName("TestElement").item(0); final Node child = vs.getFirstChild(); String x = child.getNodeValue(); // print the value, yay! System.out.println("--- Node Text ---"); System.out.println(x); 

Serialization using the LSSerializer is the way the W3C does this ( see here ). The output is as expected with line breaks:

 --- XML --- <?xml version="1.0" encoding="UTF-16"?> <TestElement><![CDATA[first line second line ]]></TestElement> --- Node Text --- first line second line 
+5
source share

You need to check the type of each node using node.getNodeType (). If the type is CDATA_SECTION_NODE, you need to concatenate the CDATA protection to node.getNodeValue.

+2
source share

You do not have to use CDATA to save spaces. The XML specification defines how to encode these characters.

So, for example, if you have an element with a value that contains a new space, you should encode it with

  &#xA; 

Carriage Return:

  &#xD; 

And so on

+2
source share

EDIT: cut out all unnecessary things

I am curious to know which DOM implementation you are using, because it does not reflect the default behavior of one of the two JVMs I tried (they come with Xerces impl). I am also interested in what newlines your document has.

I am not sure if CDATA is space-saving. I suspect there are many factors. Do DTDs / schemas affect whitespace handling?

You can try using the xml: space = "save" attribute.

0
source share

xml: space = 'preserve' is wrong. This is for all white nodes only. That is, if you want the nodes of the spaces in

 <this xml:space='preserve'> <has/> <whitespace/> </this> 

But look, these whitespace nodes are ONLY spaces.

I struggled to get Xerces to generate events to isolate the contents of CDATA. I have no solution yet.

0
source share

All Articles