Get value of <h2> html page with PHP DOM?
I have a var http link (craigslist) $link
and put the content in $linkhtml
. In this var, this is the craigslist page's HTML code, $link
.
I need to extract text between <h2>
and </h2>
. I could use regexp, but how to do it with the PHP DOM? I still have this:
$linkhtml= file_get_contents($link); $dom = new DOMDocument; @$dom->loadHTML($linkhtml);
What should I do to put the contents of the <h2>
element in var $title
?
+4
3 answers
if the DOMDocument
looks complicated to understand / use for you, you can try PHP Simple HTML DOM Parser , which provides the easiest way to parse html.
require 'simple_html_dom.php'; $html = '<h1>Header 1</h1><h2>Header 2</h2>'; $dom = new simple_html_dom(); $dom->load( $html ); $title = $dom->find('h2',0)->plaintext; echo $title; // outputs: Header 2
+4
You can use this code:
$linkhtml= file_get_contents($link); $doc = new DOMDocument(); libxml_use_internal_errors(true); $doc->loadHTML($linkhtml); // loads your html $xpath = new DOMXPath($doc); $h2text = $xpath->evaluate("string(//h2/text())"); // $h2text is your text between <h2> and </h2>
+3
You can do this with XPath: untested, may contain errors
$linkhtml= file_get_contents($link); $dom = new DOMDocument; @$dom->loadHTML($linkhtml); $xpath = new DOMXpath($dom); $elements = $xpath->query("/html/body/h2"); if (!is_null($elements)) { foreach ($elements as $element) { $nodes = $element->childNodes; foreach ($nodes as $node) { echo $node->nodeValue. "\n"; } } }
+1