Php output xml generates parsing error "& rsquo;"

Is there any function that I can use to parse any string to ensure that it doesn't cause problems with xml parsing? I have a php script outputting an XML file with content retrieved from forms.

The fact is that, in addition to the usual checks of strings from a php form, part of the user text causes xml parsing errors. I came across this " ’ " in particular. This is the error I get Entity 'rsquo' not defined

Does anyone have any experience coding text for xml output?

Thanks!


Some clarifications: I am outputting content from forms to an XML file, which is subsequently parsed by javascript.

I process all form input: htmlentities(trim($_POST['content']), ENT_QUOTES, 'UTF-8');

When I want to output this content to an xml file, how can I encode it so that it does not cause xml parsing errors?

The following 2 solutions still work:

1) echo '<content><![CDATA['.$content.']]></content>';

2) echo '<content>'.htmlspecialchars(html_entity_decode($content, ENT_QUOTES, 'UTF-8'),ENT_QUOTES, 'UTF-8').'</content>'."\n";

Are these 2 solutions safe? What's better?

Thank you, sorry for not providing this information before.

+4
source share
7 answers

You take it wrong - do not look at the parser, which does not give you errors. Instead, try having well-formed xml.

How did you get &rsquo; from the user? If he literally typed it, you will handle the input incorrectly - for example, you must exit and &amp; . If you place the object (perhaps instead of some apostrophe), either define it in DTD ( <!ENTITY rsquo "&x2019;"> ), or write it using numerical notation ( &#x2019; ), since almost each of the named objects is part of HTML. XML defines only a few basic ones, as Gumbo pointed out.

EDIT based on additions to the question:

  • In # 1, you avoid the content in such a way that if the user enters ]]> <Β°)))>< , you have a problem.
  • In # 2, you do encoding and decoding, which results in the original value of $ content. decoding should not be necessary (unless you expect users to post values ​​like &amp; which should be interpreted as &).

If you use htmlspecialchars () with ENT_QUOTES, this should be fine, but see how Drupal does it .

+8
source
 html_entity_decode($string, ENT_QUOTES, 'UTF-8') 
+4
source

Lock value in CDATA tags.

 <message><![CDATA[&rsquo;]]></message> 

From the w3schools website :

Characters such as "<" and "&" are illegal in XML elements.

"<" will generate an error because the parser interprets it as the beginning of a new element.

"&" will generate an error because the parser interprets it as the beginning of a character entity.

Some texts, such as JavaScript code, contain many "<" or "&" characters. To avoid script errors, the code can be defined as CDATA.

Everything inside the CDATA section is ignored by the parser.

+3
source

The problem is that your htmlentities function does what it needs - by generating HTML objects from characters. Then you paste them into an XML document that does not have specific HTML objects (such as &rsquo; are HTML specific).

The easiest way to handle this is to save all the original data (i.e. not parse it with htmlentities ), and then generate your XML using PHP XML functions.

This ensures that all text is correctly encoded and your XML is correctly formed.

Example:

 $user_input = "...<>&'"; $doc = new DOMDocument('1.0','utf-8'); $element = $doc->createElement("content"); $element->appendChild($doc->createTextNode($user_input)); $doc->appendChild($element); 
+3
source

I had a similar problem that the data that I needed to add to XML was already being returned by my code as htmlentities () (and not in a database like this).

i:

 $doc = new DOMDocument('1.0','utf-8'); $element = $doc->createElement("content"); $element->appendChild($doc->createElement('string', htmlspecialchars(html_entity_decode($string, ENT_QUOTES, 'UTF-8'), ENT_XML1, 'UTF-8'))); $doc->appendChild($element); 

or if it has not been in htmlentities () just below should work

 $doc = new DOMDocument('1.0','utf-8'); $element = $doc->createElement("content"); $element->appendChild($doc->createElement('string', htmlspecialchars($string, ENT_XML1, 'UTF-8'))); $doc->appendChild($element); 

basically using htmlspecialchars with ENT_XML1, should get user imputed data into secure XML data (and works fine for me):

 htmlspecialchars($string, ENT_XML1, 'UTF-8'); 
+1
source

Using htmlspecialchars () will solve your problem. See the message below.

PHP Is htmlentities () enough to create XML safe values?

0
source
 htmlspecialchars($trim($_POST['content'], ENT_XML1, 'UTF-8'); 

Must do it.

0
source

Source: https://habr.com/ru/post/1314222/


All Articles