How to decode special characters in XML files?

In some XML files that I process (often RSS), I look at text containing characters like Today's Newest , which becomes Today’s Newest after I extract the text from node. This suggests that I am not properly processing the decoding process.

I could just fix my script to fix this error, but what if there are many other characters that get garbled? What is the proper way to digest XML files without iterating over encoding when converting it to a UTF-8 script?

Here are some of the things I've tried that don't seem to work:

 $xml = file_get_contents($file); // One: still contains ’ //$xml = @iconv('UTF-8', 'UTF-8//IGNORE', $xml); // Two: LibXMLError Entity 'rsquo' not defined //$xml = htmlentities($xml, null, 'UTF-8'); //$xml = htmlspecialchars_decode($xml, ENT_QUOTES); // Three: still contains ’ //$xml = mb_convert_encoding($xml, "UTF-8", "UTF-8"); $xml = simplexml_load_string($xml, null, LIBXML_NOCDATA | LIBXML_NOENT); 
+4
source share
2 answers

Check how you display your content. This can also happen if the target result does not support UTF-8.

I assume that you exit to the browser, so check the browser encoding and try to explicitly set it to UTF-8, as you can get the correct text from XML, but it just does not display correctly.

Also try loading XML DOMDocument if above doesn't help

+1
source

Try:

$xml = simplexml_load_string($xml, null, LIBXML_NOCDATA | LIBXML_NOENT); $xml->addAttribute('encoding', 'UTF-8');

+1
source

All Articles