Get value of <h2> html page with PHP DOM?

Question

Get value of <h2> html page with PHP DOM?

I have a var http link (craigslist) $link and put the content in $linkhtml . In this var, this is the craigslist page's HTML code, $link .

I need to extract text between <h2> and </h2> . I could use regexp, but how to do it with the PHP DOM? I still have this:

 $linkhtml= file_get_contents($link); $dom = new DOMDocument; @$dom->loadHTML($linkhtml);

What should I do to put the contents of the <h2> element in var $title ?

+4

php

Matt May 09 '12 at 10:13

source share

3 answers

You can use this code:

 $linkhtml= file_get_contents($link); $doc = new DOMDocument(); libxml_use_internal_errors(true); $doc->loadHTML($linkhtml); // loads your html $xpath = new DOMXPath($doc); $h2text = $xpath->evaluate("string(//h2/text())"); // $h2text is your text between <h2> and </h2>

+3

anubhava May 09 '12 at 10:20

source share

You can do this with XPath: untested, may contain errors

 $linkhtml= file_get_contents($link); $dom = new DOMDocument; @$dom->loadHTML($linkhtml); $xpath = new DOMXpath($dom); $elements = $xpath->query("/html/body/h2"); if (!is_null($elements)) { foreach ($elements as $element) { $nodes = $element->childNodes; foreach ($nodes as $node) { echo $node->nodeValue. "\n"; } } }

+1

ilanco May 09 '12 at 10:16

source share

Conrad warhol · Accepted Answer · 2012-05-09T22:21:52+0000

if the DOMDocument looks complicated to understand / use for you, you can try PHP Simple HTML DOM Parser , which provides the easiest way to parse html.

 require 'simple_html_dom.php'; $html = '<h1>Header 1</h1><h2>Header 2</h2>'; $dom = new simple_html_dom(); $dom->load( $html ); $title = $dom->find('h2',0)->plaintext; echo $title; // outputs: Header 2

Get value of <h2> html page with PHP DOM?

More articles: