DOM: select all text nodes in a document (PHP)

I have the following (PHP) code that traverses the entire DOM document to get all the text nodes. This is a bit ugly decision, and I'm sure there must be a better way ... so, is there?

$skip = false; $node = $document; $nodes = array(); while ($node) { if ($node->nodeType == 3) { $nodes[] = $node; } if (!$skip && $node->firstChild) { $node = $node->firstChild; } elseif ($node->nextSibling) { $node = $node->nextSibling; $skip = false; } else { $node = $node->parentNode; $skip = true; } } 

Thanks.

+4
source share
2 answers

The XPath expression you need is //text() . Try using it with DOMXPath::query . For instance:

 $xpath = new DOMXPath($doc); $textnodes = $xpath->query('//text()'); 
+10
source

Will preg_split work?

 $textNodes = preg_split( '/<[^]+>/', $documentContent, -1, PREG_SPLIT_NO_EMPTY ); 
0
source

All Articles