PHP - DOMDocument :: saveHTML creates strange objects

So, I am pulling xml from the API, and my goal is to save this xhtml as html in a file for users to view.

The problem is that some new strange objects appear in the saved html file that it should not have. Here is an example.

Here's what the pulled xhtml snippet looks like:

<p>    "At that point

And here is what the saved file looks like:

<p>&Acirc;&nbsp;&Acirc;&nbsp;&Acirc;&nbsp; "At that point

And here is what Chrome sees:

<p>Γ‚&nbsp;Γ‚&nbsp;Γ‚&nbsp; "At that point

From the pushed xhtml so that it is saved, it is processed by several different classes, so I would simplify all the objects passed to the data for simplification.

//curl call is initialized here

$raw = curl_exec($ch);

$simplexml = simplexml_load_string($raw);

$xmlstr = $simplexml->xpath($xpath)->asXML();

$html = new DOMDocument;
$html->formatOutput = true;
$wrapper = $html->createElement("div");
$wrapper->setAttribute("id", "wrapper");
$wrapper = $html->appendChild($wrapper);

$content = DOMDocument::loadHTML($xmlstr, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
foreach($content->firstChild->childNodes as $node)
    $wrapper->appendChild($html->importNode($node, TRUE));

$htmlstr = $html->saveHTML();


$html = new DOMDocument;
$html->formatOutput = true;

$content = DOMDocument::loadHTML($htmlstr, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
foreach($content->childNodes as $node)
    $html->appendChild($html->importNode($node, TRUE));

$html_str = $html->saveHTML();

file_put_contents($content_path, $html_str);

Yes, it’s a little complicated, but the data is transferred quite a bit, since there should be a lot of something added.

I just don’t understand where these new entities come from. Any help would be appreciated.

+4
2

, .

simplexml :

$xmlstr = $simplexml->xpath($xpath)->asXML();

XML, , DOMDoc, importHTML:

$content = DOMDocument::loadHTML($xmlstr, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

, loadXML loadHTML:

$content = DOMDocument::loadXML($xmlstr, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

:

<p>&nbsp;&nbsp;&nbsp; "At that point

() . , .

+3

, . , .

0

All Articles