How to decode special characters in XML files?

Question

How to decode special characters in XML files?

In some XML files that I process (often RSS), I look at text containing characters like Today's Newest , which becomes Todayâ€™s Newest after I extract the text from node. This suggests that I am not properly processing the decoding process.

I could just fix my script to fix this error, but what if there are many other characters that get garbled? What is the proper way to digest XML files without iterating over encoding when converting it to a UTF-8 script?

Here are some of the things I've tried that don't seem to work:

 $xml = file_get_contents($file); // One: still contains â€™ //$xml = @iconv('UTF-8', 'UTF-8//IGNORE', $xml); // Two: LibXMLError Entity 'rsquo' not defined //$xml = htmlentities($xml, null, 'UTF-8'); //$xml = htmlspecialchars_decode($xml, ENT_QUOTES); // Three: still contains â€™ //$xml = mb_convert_encoding($xml, "UTF-8", "UTF-8"); $xml = simplexml_load_string($xml, null, LIBXML_NOCDATA | LIBXML_NOENT);

+4

xml php unicode character-encoding libxml2

Xeoncross Aug 9 '12 at 15:14

source share

2 answers

Try:

$xml = simplexml_load_string($xml, null, LIBXML_NOCDATA | LIBXML_NOENT); $xml->addAttribute('encoding', 'UTF-8');

+1

Kalpesh Aug 9 '12 at 15:22

source share

zysoft · Accepted Answer · 2012-08-09T15:29:59+0000

Check how you display your content. This can also happen if the target result does not support UTF-8.

I assume that you exit to the browser, so check the browser encoding and try to explicitly set it to UTF-8, as you can get the correct text from XML, but it just does not display correctly.

Also try loading XML DOMDocument if above doesn't help

How to decode special characters in XML files?

More articles: