Dom4J saves spaces when writing a file

I am working on a program that uses Dom4J to write xml files. In the database schema that I am writing, there is a convenient xml schema for checking and importing. Dom4J works fine, but I can't figure out how to set the save field in the XMLWriter class of Dom4J. I have a specific element where I need the encoded "\ n".

The javadoc for this class is a bit underestimated http://dom4j.sourceforge.net/dom4j-1.6.1/apidocs/org/dom4j/io/XMLWriter.html

I tried playing with the OutputFormat object, but without the dice.

Can someone tell me how to ensure that the XMLWriter object retains the spaces of the elements of the dom4j tree when writing to the file.

Thanks,

Donald

Let's say that I start with:

Element accession = factory.createElement("title"); List<String> AUT = new ArrayList<String>; AUT.add("author1"); AUT.add("author2"); String title = "Title"; 

I would like to have an output similar to:

 <title>author1 author2 Title</title> 

With a row return in the header field.

 DefaultEntity e = new DefaultEntity("#10"); if(AUT.size() > 1) { for(String a : AUT) { accession.addText(a); accession.add(e); } accession.addText(title); } 

This does not work as it is an IllegalAddException exception.

+4
source share
1 answer

First of all, the โ€œsaveโ€ property has nothing to do with saving the encoding of a previously encoded character, but rather with saving the space contained in the element. This property is usually controlled by the xml:space="preserve" attribute.

However, if your use case is that you have a coded newline in your input that you want to keep in your output, you have problems. DOM4J will decode all entities and references to the corresponding Java characters (UTF-16). This is partially controlled by configuring the main XMLreader, but as far as I know, XMLReader will not report the beginning and end of character references - they will be silently replaced with the corresponding character values.

At the output, XMLWriter will encode only those characters that must be encoded either because of XML rules or because of the encoding used in serialization (for example, UTF-8 or ISO-8859-1, etc.).

In this case, you have basically two options.

1) Sub class XMLWriter and completely replace the characters () method, since handling this space is indeed an integral part of this method. There is no other way that you can intercept a tab entry, new line, or carriage return. Here you must somehow keep track of where you are and find out that you are processing the correct newline character.

2) Define the new line character that you want to "re-escape" and replace it with the DefaultEntity("#10") node, setting the resolveEntityRefs property to XMLWriter false . This option involves splitting the existing node text into two parts and inserting the node object between them.

Option 2 seems to include less work, but still cumbersome

UPDATE:

OK, it seems that you cannot add the same object twice. If you add a new instance of the object every time it works. However, your case can be fixed with ร dding xml:space="preserve" for your element.

  if (AUT.size() > 1) { for (String a : AUT) { accession.addText(a); accession.addText("\n"); } accession.addText(title); } 

and then

  accession.addAttribute(QName.get("space", Namespace.XML_NAMESPACE), "preserve"); 

In this case, your explicitly added line breaks should be preserved, regardless of the output format used when writing to xml.

Sorry for the confusion.

+3
source

All Articles