When is it necessary to avoid characters in XML?

When should we replace < > & " ' in XML with characters like < , etc.

I understand that just to make sure that if some of the XML content has > < , the parser will not process the beginning or end of the tag.

Also, if I have XML:

 <hello>mor>ning<hello> 

if it needs to be replaced by:

  • <hello>mor>ning<hello>
  • <hello>mor>ning<hello>
  • <hello>mor>ning<hello>

I don’t understand why a replacement is needed. When exactly is this needed and what exactly (tags or text) should be replaced?

+8
soap xml escaping
source share
5 answers

< , > , & , " and ' all have special values ​​in XML (for example," start of entity "or" separator of attribute values ​​").

So that these characters are displayed as data (instead of their special meaning), they can be represented by entities ( &lt; for < , etc.).

Sometimes these special values ​​are context sensitive (for example, "does not mean" attribute separator "outside the tag), and there are places where they may appear as raw as data. Instead of worrying about these exceptions, the easiest way is to always always represent them as objects, if you want to avoid their special significance, then the only information received is the clear sections of CDATA, where the special meaning is not fulfilled (and & does not start the entity).

if it needs to be replaced by

It should not be represented in any of them. Objects must be completed using a colon.

How you should represent this depends on which bit of your sample data is and which is the markup. You did not say, for example, if <hello> should be data or a start tag for a welcome element.

+7
source share

Section 2.4 of the XML specification clearly states:

The ampersand symbol (&) and the left corner bracket (<) should not appear in their literal form, unless they are used as markup dividers, or in a comment, processing instruction, or CDATA section. If needed elsewhere, they must be escaped using either numeric symbolic links or the lines "&" and "& lt;" respectively. The right angle bracket (>) can be represented using string ">" and for compatibility should be escaped using either "& gt;" or a link to a character when it appears in the line "]]>" in content, when this line does not mark the end of the CDATA section.

+7
source share

You need to encode all characters that have special meaning in XML, but should not be interpreted by the parser.

Assuming your XML

 <hello>mor>ning</hello> 

you would encode it as

 <hello>mor&gt;ning</hello> 

or use CDATA [Wikipedia] :

 <hello><![CDATA[mor>ning]]></hello> 
+4
source share

You can see this explanation by entering a description of the link here, but mostly characters such as <and> are important when parsing an XML document. If the additional text of these special characters is included in the text of the text or xml node attribute, the parser will not be able to correctly understand the document. If you submit xml to some kind of web service, all special characters must be properly escaped.

+1
source share

https://github.com/savonrb/gyoku/blob/master/README.md

You can use Gyoku to not hide characters in CDATA.

+1
source share

All Articles