If I try to load an HTML document into the PHP DOM, I get an error message:
Error DOMDocument::loadHTML() [domdocument.loadhtml]: ID someAnchor already defined in Entity, line: 9
I can’t understand why. Here is the code that loads the HTML string into the DOM.
First, without the content of the anchor and the second with one. An error appears in the second document.
Hope you can cut and paste it into a script and run it to see the same output:
<?php
ini_set('display_errors', 1);
error_reporting(E_ALL);
$stringWithNoAnchor = <<<EOT
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body >
<h1>Hello</h1>
</body>
</html>
EOT;
$stringWithAnchor = <<<EOT
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body >
<h1>Hello</h1>
<a name="someAnchor" id="someAnchor"></a>
</body>
</html>
EOT;
class domGrabber
{
public $_FileErrorStr = '';
public function getLoadAsDOMObj($htmlString)
{
$this->_FileErrorStr ='';
$xmlDoc = new DOMDocument();
set_error_handler(array($this, '_FileErrorHandler'));
$xmlDoc->loadHTML($htmlString);
restore_error_handler();
return $xmlDoc;
}
public function _FileErrorHandler($errno, $errstr, $errfile, $errline)
{
if ($this->_FileErrorStr === null)
{
$this->_FileErrorStr = $errstr;
}
else {
$this->_FileErrorStr .= (PHP_EOL . $errstr);
}
}
}
$domGrabber = new domGrabber();
$xmlDoc = $domGrabber->getLoadAsDOMObj($stringWithNoAnchor );
echo 'PHP Version: '. phpversion() .'<br />'."\n";
echo '<pre>';
print $xmlDoc->saveXML();
echo '</pre>'."\n";
if ($domGrabber->_FileErrorStr)
{
echo 'Error'. $domGrabber->_FileErrorStr;
}
$xmlDoc = $domGrabber->getLoadAsDOMObj($stringWithAnchor);
echo '<pre>';
print $xmlDoc->saveXML();
echo '</pre>'."\n";
if ($domGrabber->_FileErrorStr)
{
echo 'Error'. $domGrabber->_FileErrorStr;
}
I get the following from my source code in Firefox:
PHP Version: 5.2.9<br />
<pre><?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml"><head><title>My document</title><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /></head><body>
<h1>Hello</h1>
</body></html>
</pre>
<pre><?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml"><head><title>My document</title><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /></head><body>
<h1>Hello</h1>
<a name="someAnchor" id="someAnchor"></a>
</body></html>
</pre>
Error
DOMDocument::loadHTML() [<a href='domdocument.loadhtml'>domdocument.loadhtml</a>]: ID someAnchor already defined in Entity, line: 9
So why does the DOM say someAnchor is already defined?
Update:
I experimented with both
- Instead of using loadHTML (), I used the loadXML () method and fixed it
- Instead of having both id and name, I used only id - Attribute and fixed it.
See a comparison script here to complete:
<?php
ini_set('display_errors', 1);
error_reporting(E_ALL);
$stringWithNoAnchor = <<<EOT
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body >
<p>stringWithNoAnchor</p>
</body>
</html>
EOT;
$stringWithAnchor = <<<EOT
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body >
<p>stringWithAnchor</p>
<a name="someAnchor" id="someAnchor" ></a>
</body>
</html>
EOT;
$stringWithAnchorButOnlyIdAtt = <<<EOT
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body >
<p>stringWithAnchorButOnlyIdAtt</p>
<a id="someAnchor"></a>
</body>
</html>
EOT;
class domGrabber
{
public $_FileErrorStr = '';
public $useHTMLMethod = TRUE;
public function loadDOMObjAndWriteOut($htmlString)
{
$this->_FileErrorStr ='';
$xmlDoc = new DOMDocument();
set_error_handler(array($this, '_FileErrorHandler'));
if ($this->useHTMLMethod)
{
$xmlDoc->loadHTML($htmlString);
}
else {
$xmlDoc->loadXML($htmlString);
}
restore_error_handler();
echo "<h1>";
echo ($this->useHTMLMethod) ? 'using xmlDoc->loadHTML() ' : 'using $xmlDoc->loadXML()';
echo "</h1>";
echo '<pre>';
print $xmlDoc->saveXML();
echo '</pre>'."\n";
if ($this->_FileErrorStr)
{
echo 'Error'. $this->_FileErrorStr;
}
}
public function _FileErrorHandler($errno, $errstr, $errfile, $errline)
{
if ($this->_FileErrorStr === null)
{
$this->_FileErrorStr = $errstr;
}
else {
$this->_FileErrorStr .= (PHP_EOL . $errstr);
}
}
}
$domGrabber = new domGrabber();
echo 'PHP Version: '. phpversion() .'<br />'."\n";
$domGrabber->useHTMLMethod = TRUE;
$domGrabber->loadDOMObjAndWriteOut($stringWithNoAnchor);
$domGrabber->loadDOMObjAndWriteOut($stringWithAnchor );
$domGrabber->loadDOMObjAndWriteOut($stringWithAnchorButOnlyIdAtt);
$domGrabber->useHTMLMethod = FALSE;
$domGrabber->loadDOMObjAndWriteOut($stringWithNoAnchor);
$domGrabber->loadDOMObjAndWriteOut($stringWithAnchor );
$domGrabber->loadDOMObjAndWriteOut($stringWithAnchorButOnlyIdAtt);