How to save newlines in CDATA when creating XML?

Question

How to save newlines in CDATA when creating XML?

I want to write text containing white space characters, such as newline and tab , in an xml file, so I use

 Element element = xmldoc.createElement("TestElement"); element.appendChild(xmldoc.createCDATASection(somestring));

but when i read it again when using

 Node vs = xmldoc.getElementsByTagName("TestElement").item(0); String x = vs.getFirstChild().getNodeValue();

I get a row that no longer has rows. When I look directly in xml on disk, newline characters seem to be saved. therefore, the problem occurs when reading in an XML file.

How to save newlines?

Thanks!

+6

java xml w3c cdata newline

clamp Aug 1 '09 at 15:52

source share

5 answers

You need to check the type of each node using node.getNodeType (). If the type is CDATA_SECTION_NODE, you need to concatenate the CDATA protection to node.getNodeValue.

+2

fpmurphy1 Aug 1 '09 at 16:16

source share

You do not have to use CDATA to save spaces. The XML specification defines how to encode these characters.

So, for example, if you have an element with a value that contains a new space, you should encode it with

  &#xA;

Carriage Return:

  &#xD;

And so on

+2

Liorh Aug 1 '09 at 16:48

source share

EDIT: cut out all unnecessary things

I am curious to know which DOM implementation you are using, because it does not reflect the default behavior of one of the two JVMs I tried (they come with Xerces impl). I am also interested in what newlines your document has.

I am not sure if CDATA is space-saving. I suspect there are many factors. Do DTDs / schemas affect whitespace handling?

You can try using the xml: space = "save" attribute.

0

Mcdowell Aug 1 '09 at 16:15

source share

xml: space = 'preserve' is wrong. This is for all white nodes only. That is, if you want the nodes of the spaces in

 <this xml:space='preserve'> <has/> <whitespace/> </this>

But look, these whitespace nodes are ONLY spaces.

I struggled to get Xerces to generate events to isolate the contents of CDATA. I have no solution yet.

0

Mike beckerle Dec 13 '14 at 6:36

source share

Aviad Ben Dov · Accepted Answer · 2009-08-08T11:43:03+0000

I don’t know how you parse and write your document, but here is an example of an extended code based on yours:

 // creating the document in-memory Document xmldoc = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument(); Element element = xmldoc.createElement("TestElement"); xmldoc.appendChild(element); element.appendChild(xmldoc.createCDATASection("first line\nsecond line\n")); // serializing the xml to a string DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance(); DOMImplementationLS impl = (DOMImplementationLS)registry.getDOMImplementation("LS"); LSSerializer writer = impl.createLSSerializer(); String str = writer.writeToString(xmldoc); // printing the xml for verification of whitespace in cdata System.out.println("--- XML ---"); System.out.println(str); // de-serializing the xml from the string final Charset charset = Charset.forName("utf-16"); final ByteArrayInputStream input = new ByteArrayInputStream(str.getBytes(charset)); Document xmldoc2 = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(input); Node vs = xmldoc2.getElementsByTagName("TestElement").item(0); final Node child = vs.getFirstChild(); String x = child.getNodeValue(); // print the value, yay! System.out.println("--- Node Text ---"); System.out.println(x);

Serialization using the LSSerializer is the way the W3C does this ( see here ). The output is as expected with line breaks:

 --- XML --- <?xml version="1.0" encoding="UTF-16"?> <TestElement><![CDATA[first line second line ]]></TestElement> --- Node Text --- first line second line

How to save newlines in CDATA when creating XML?

More articles: