PHP Simple HTML DOM Parser find string

Question

PHP Simple HTML DOM Parser find string

I use a simple PHP PHP parser, but it does not have the ability to search for text. I need to find a string and find the parent id for it. Essentially reverse normal use.

Does anyone know how?

+7

dom html php

Charlie Mar 28 '11 at 22:14

source share

4 answers

 $html = file_get_html('http://www.google.com/'); $eles = $html->find('*'); foreach($eles as $e) { if(strpos($e->innertext, 'theString') !== false) { echo $e->id; } }

http://simplehtmldom.sourceforge.net/manual.htm

+6

karim79 Mar 28 '11 at 10:21

source share

Got a response. The whole example is a bit long, but it works. I also show the result.

HTML for what we will look:

 <html> <head> <title>Simple HTML DOM - Find Text</title> </head> <body> <h3>Simple HTML DOM - Find Text</h3> <div id="first"> <p>This is a paragraph inside of div 'first'. This paragraph does not have the text we are looking for.</p> <p>As a matter of fact this div does not have the text we are looking for</p> </div> <div id="second"> <ul> <li>This is an unordered list. <li id="love1">We are looking for the following word love. <li>Does not contain the word. </ul> <p id="love2">This paragraph which is in div second contains the word love.</p> </div> <div id="third"> <a id="love3" href="goes.nowhere.com">link to love site</a> </div> </body> </html>

PHP:

 <?php include_once('simple_html_dom.php'); function scraping_for_text($iUrl,$iText) { echo "iUrl=".$iUrl."<br />"; echo "iText=".$iText."<br />"; // create HTML DOM $html = file_get_html($iUrl); // get text elements $aObj = $html->find('text'); if (count($aObj) > 0) { echo "<h4>Found ".$iText."</h4>"; } else { echo "<h4>No ".$iText." found"."</h4>"; } foreach ($aObj as $key=>$oLove) { $plaintext = $oLove->plaintext; if (strpos($plaintext,$iText) !== FALSE) { echo $key.": text=".$plaintext."<br />" ."--- parent tag=".$oLove->parent()->tag."<br />" ."--- parent id=".$oLove->parent()->id."<br />"; } } // clean up memory $html->clear(); unset($html); return; } // ------------------------------------------------------------- // test it! // user_agent header... ini_set('user_agent', 'My-Application/2.5'); scraping_for_text("test_text.htm","love"); ?>

Exit:

 iUrl=test_text.htm iText=love Found love 18: text=We are looking for the following word love. --- parent tag=li --- parent id=love1 21: text=This paragraph which is in div second contains the word love. --- parent tag=p --- parent id=love2 25: text=link to love site --- parent tag=a --- parent id=love3

What they all wrote !!!!

+3

akeane Jul 01 '11 at 23:29

source share

Imagine that any tag has a "plaintext" attribute and uses standard attribute selectors.

So HTML:

 <div id="div1"> <span>London is the capital</span> of Great Britain </div> <div id="div2"> <span>Washington is the capital</span> of the USA </div>

can be imagined as:

 <div id="div1" plaintext="London is the capital of Great Britain"> <span plaintext="London is the capital ">London is the capital</span> of Great Britain </div> <div id="div2" plaintext="Washington is the capital of the USA"> <span plaintext="Washington is the capital ">Washington is the capital</span> of the USA </div>

And PHP to solve your problem is simple:

 <?php $t = ' <div id="div1"> <span>London is the capital</span> of Great Britain </div> <div id="div2"> <span>Washington is the capital</span> of the USA </div>'; $html = str_get_html($t); $foo = $html->find('span[plaintext^=London]'); echo "ID: " . $foo[0]->parent()->id; // div1 ?>

(note that the "plaintext" for the <span> tags has a space with a space, this is the default behavior of Simple HTML DOM, defined by the constant DEFAULT_SPAN_TEXT )

+2

wake-spb Jul 05 '15 at 16:35

source share

Wrikken · Accepted Answer · 2011-03-28T22:19:21+0000

$d = new DOMDocument(); $d->loadXML($xml); $x = new DOMXPath($d); $result = $x->evaluate("//text()[contains(.,'617.99')]/ancestor::*/@id"); $unique = null; for($i = $result->length -1;$i >= 0 && $item = $result->item($i);$i--){ if($x->query("//*[@id='".addslashes($item->value)."']")->length == 1){ echo 'Unique ID is '.$item->value."\n"; $unique = $item->value; break; } } if(is_null($unique)) echo 'no unique ID found';

PHP Simple HTML DOM Parser find string

More articles: