Warning: DOMDocument :: loadHTML (): htmlParseEntityRef: expected ';' in Entity,

Question

Warning: DOMDocument :: loadHTML (): htmlParseEntityRef: expected ';' in Entity,

$html = file_get_contents("http://www.somesite.com/"); $dom = new DOMDocument(); $dom->loadHTML($html); echo $dom;

throws

 Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity, Catchable fatal error: Object of class DOMDocument could not be converted to string in test.php on line 10

+79

php

gweg Nov 06 '09 at 3:40

source share

12 answers

Dewsworld · Answer 1 · 2012-05-07 13:05

To evaporate the warning, you can use libxml_use_internal_errors(true)

 // create new DOMDocument $document = new \DOMDocument('1.0', 'UTF-8'); // set error level $internalErrors = libxml_use_internal_errors(true); // load HTML $document->loadHTML($html); // Restore error level libxml_use_internal_errors($internalErrors);

mattalxndr · Answer 2 · 2011-02-23 01:24

I would argue that if you look at the source http://www.somesite.com/ , you will find special characters that have not been converted to HTML. Maybe something like this:

 <a href="/script.php?foo=bar&hello=world">link</a>

Must be

 <a href="/script.php?foo=bar&amp;hello=world">link</a>

Maanas Royy · Answer 3 · 2010-10-16 05:28

 $dom->@loadHTML($html);

This is incorrect, use instead:

 @$dom->loadHTML($html);

Mike B · Answer 4 · 2009-11-06 03:46

The reason for your fatal error is that the DOMDocument does not have a __toString () method and therefore cannot be echoed.

You may be looking for

 echo $dom->saveHTML();

user279583 · Answer 5 · 2010-02-27 06:43

There are 2 errors: the second is because $ dom is not a string, but an object and, therefore, cannot be an "echo". The first error is a warning from loadHTML, caused by the invalid syntax of the html document to load (possibly a and used as a parameter separator and not masked as an object with &).

You ignore and suppress this error message (not the error, just the message!), Calling the function with the error control operator "@" ( http://www.php.net/manual/en/language.operators.errorcontrol.php )

 $dom->@loadHTML($html);

Lorenz Lo Sauer · Answer 6 · 2011-09-12 10:43

Regardless of the echo (which should be replaced by print_r or var_dump), if an exception is thrown, the object should remain empty:

 DOMNodeList Object ( )

Decision

Set recover to true and strictErrorChecking to false

 $content = file_get_contents($url); $doc = new DOMDocument(); $doc->recover = true; $doc->strictErrorChecking = false; $doc->loadHTML($content);

Use php entity encoding in markup content, which is the most common source of errors.

David Chan · Answer 7 · 2014-09-16 22:32

replace simple

 $dom->loadHTML($html);

with more reliable ...

 libxml_use_internal_errors(true); if (!$DOM->loadHTML($page)) { $errors=""; foreach (libxml_get_errors() as $error) { $errors.=$error->message."<br/>"; } libxml_clear_errors(); print "libxml errors:<br>$errors"; return; }

nmwi22 · Answer 8 · 2017-11-22 11:19

 $html = file_get_contents("http://www.somesite.com/"); $dom = new DOMDocument(); $dom->loadHTML(htmlspecialchars($html)); echo $dom;

try it

lastYorsh · Answer 9 · 2013-10-22 18:57

Another possible solution is

 $sContent = htmlspecialchars($sHTML); $oDom = new DOMDocument(); $oDom->loadHTML($sContent); echo html_entity_decode($oDom->saveHTML());

Nicolas Bouvrette · Answer 10 · 2015-02-15 14:02

I know this is an old question, but if you ever want to fix the incorrect & &; characters in your HTML. You can use code similar to this:

 $page = file_get_contents('http://www.example.com'); $page = preg_replace('/\s+/', ' ', trim($page)); fixAmps($page, 0); $dom->loadHTML($page); function fixAmps(&$html, $offset) { $positionAmp = strpos($html, '&', $offset); $positionSemiColumn = strpos($html, ';', $positionAmp+1); $string = substr($html, $positionAmp, $positionSemiColumn-$positionAmp+1); if ($positionAmp !== false) { // If an '&' can be found. if ($positionSemiColumn === false) { // If no ';' can be found. $html = substr_replace($html, '&amp;', $positionAmp, 1); // Replace straight away. } else if (preg_match('/&(#[0-9]+|[AZ|az|0-9]+);/', $string) === 0) { // If a standard escape cannot be found. $html = substr_replace($html, '&amp;', $positionAmp, 1); // This mean we need to escapa the '&' sign. fixAmps($html, $positionAmp+5); // Recursive call from the new position. } else { fixAmps($html, $positionAmp+1); // Recursive call from the new position. } } }

ananda · Answer 11 · 2016-05-29 10:05

This is not always due to the content of the page and may be due to the URL itself .

I recently ran into this error, and a carriage character was returned at the end of the URL. The reason for the existence of this character was an error in splitting URLs.

 $urls_array = explode("\r\n", $urls);

instead

 $urls_array = explode("\n", $urls);

FRANK · Answer 12 · 2019-06-21 05:38

Another possible solution, maybe your file is an ASCII file, just change the type of your files.

Warning: DOMDocument :: loadHTML (): htmlParseEntityRef: expected ';' in Entity,

More articles: