Disable html entity encoding in PHP DOMDocument

I cannot figure out how to stop the DOMDocument from distorting these characters.

<?php $doc = new DOMDocument(); $doc->substituteEntities = false; $doc->loadHTML('<p>¯\(°_o)/¯</p>'); print_r($doc->saveHTML()); ?> 

Expected Result: ¯ (° _o) / ¯

Actual Output: & ACIRC; & MACR; (& ACIRC; & deg; _o) / & ACIRC; & MACR;

http://codepad.org/W83eHSsT

+8
dom php
source share
2 answers

I found a hint in the comments http://php.net/manual/en/domdocument.loadhtml.php

(Comment from & mdmitry at gmail dot com> 21-Dec-2009 05:02: "You can also load HTML as UTF-8 with this simple hack:")

Just add '<?xml encoding="UTF-8">' before entering the HTML:

 $doc = new DOMDocument(); //$doc->substituteEntities = false; $doc->loadHTML('<?xml encoding="UTF-8">' . '<p>¯\(°_o)/¯</p>'); print_r($doc->saveHTML()); 
+3
source share
 <?xml version="1.0" encoding="utf-8"> 

at the top of the document, tags are taken care of .. for saveXML and saveHTML.

+2
source share

All Articles