Php DOMDocument adds <html> headers with DOCTYPE declaration

I add a #b hash for each link through the DOMDocument class.

$dom = new DOMDocument(); $dom->loadHTML($output); $a_tags = $dom->getElementsByTagName('a'); foreach($a_tags as $a) { $value = $a->getAttribute('href'); $a->setAttribute('href', $value . '#b'); } return $dom->saveHTML(); 

This works fine, but the returned output includes a DOCTYPE declaration and a <head> and <body> . Any idea why this is happening or how I can prevent this?

+7
source share
4 answers

What DOMDocument::saveHTML() does, yes: generate a full HTML document with a Doctype declaration, <head> , ...

Two possible solutions:

  • If you are working with PHP> = 5.3, saveHTML() accepts one additional parameter that may help you.
  • If you need your code to work with PHP <5.3.6 you will need to use a few str_replace() or a regular expression or any equivalent that you can imagine to remove some of the HTML code that you do not need.
    • See this note in the user guide notes for an example.
+5
source

The real problem is how the DOM loads. Use this instead: $html->loadHTML($content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

Please confirm the original answer here .

+4
source

Adding $doc->saveHTML(false); will not work and it will return an error because it expects a node, not a bool.

The solution I used:

return preg_replace('/^<!DOCTYPE.+?>/', '', str_replace( array('<html>', '</html>', '<body>', '</body>'), array('', '', '', ''), $doc->saveHTML()));

I am using PHP> 5.4

+1
source

I solved this problem by creating a new DOMDocument and copying the child nodes from the source to the new.

 function removeDocType($oldDom) { $node = $oldDom->documentElement->firstChild $dom = new DOMDocument(); foreach ($node->childNodes as $child) { $dom->appendChild($doc->importNode($child, true)); } return $dom->saveHTML(); } 

So using

 return $dom->saveHTML(); 

I use:

 return removeDocType($dom); 
0
source

All Articles