Retrieving element content using simpe-html-dom

Question

Retrieving element content using simpe-html-dom

I use simpile_html_dom to get html page elements. I have some div elements like this. All I want to do is get a "Thank you very much" sentence in every div (which is not inside any subitem). How can I do it?

<div class="right"> <h2> <a href="">Hello</a> </h2> <br/> <span>How Are You?</span> <span>How Are You?</span> <span>How Are You?</span> Fine Thanks </div>

+6

html5 php simple-html-dom

Ashkan Apr 11 '13 at 6:37

source share

4 answers

It should just be $html->find('div.right > text') , but that won't work, because plain HTML DOM Parser doesn't seem to support direct child queries.

So, first you need to find all the <div> elements and find the child nodes for the text node. Unfortunately, the ->childNodes() method maps to ->children() and therefore only returns items.

The working solution is to call ->find('text') for each <div> element, after which you filter the results based on the parent node.

 foreach ($doc->find('div.right') as $parent) { foreach ($parent->find('text') as $node) { if ($node->parent() === $parent && strlen($t = trim($node->plaintext))) { echo $t, PHP_EOL; } } }

Using a DOMDocument , this XPath expression will do the same job without pain:

 $doc = new DOMDocument; $doc->loadHTML($content); $xp = new DOMXPath($doc); foreach ($xp->query('//div/text()') as $node) { if (strlen($t = trim($node->textContent))) { echo $t, PHP_EOL; } }

+2

Ja͢ck Apr 11 '13 at 7:34

source share

I would switch to phpquery for this. You still need to use the DOM, but not too painful:

 require('phpQuery.php'); $html =<<<EOF <div class="right"> <h2> <a href="">Hello</a> </h2> <br/> <span>How Are You?</span> <span>How Are You?</span> <span>How Are You?</span> Fine Thanks </div> EOF; $dom = phpQuery::newDocumentHTML($html); foreach($dom->find("div.right > *:last") as $last_element){ echo $last_element->nextSibling->nodeValue; }

Update These days, I recommend this simple replacement that avoids ugliness at home:

 $doc = str_get_html($html); foreach($doc->find('div.right > text:last') as $el){ echo $el->text; }

+1

pguardiario Apr 11 '13 at 20:42

source share

 public function removeNode($selector) { foreach ($html->find($selector) as $node) { $node->outertext = ''; } $this->load($this->save()); }

use this function to remove the h2 and span element from the div. Then get the data of the div element.

Link URL: Simple HTML Home: How to Remove Items?

0

Sibiraj PR Apr 11 '13 at 6:47

source share

user2193789 · Accepted Answer · 2013-04-11T07:21:33+0000

There is no built-in method for reading a text property in simple_html_dom.php
But that should work;

 include 'parser.php'; $html = str_get_html('<div class="right"> <h2> <a href="">Hello</a> </h2> <br/> <span>How Are You?</span> <span>How Are You?</span> <span>How Are You?</span> Fine Thanks </div>'); function readTextNode($element){ $local = $element; $childs = count($element->childNodes()); for($i = 0; $i < $childs; $i++) $local->childNodes($i)->outertext = ''; return $local->innertext; } echo readTextNode($html->find('div.right',0));

Retrieving element content using simpe-html-dom

More articles: