Get value of <h2> html page with PHP DOM?

I have a var http link (craigslist) $link and put the content in $linkhtml . In this var, this is the craigslist page's HTML code, $link .

I need to extract text between <h2> and </h2> . I could use regexp, but how to do it with the PHP DOM? I still have this:

 $linkhtml= file_get_contents($link); $dom = new DOMDocument; @$dom->loadHTML($linkhtml); 

What should I do to put the contents of the <h2> element in var $title ?

+4
source share
3 answers

if the DOMDocument looks complicated to understand / use for you, you can try PHP Simple HTML DOM Parser , which provides the easiest way to parse html.

 require 'simple_html_dom.php'; $html = '<h1>Header 1</h1><h2>Header 2</h2>'; $dom = new simple_html_dom(); $dom->load( $html ); $title = $dom->find('h2',0)->plaintext; echo $title; // outputs: Header 2 
+4
source

You can use this code:

 $linkhtml= file_get_contents($link); $doc = new DOMDocument(); libxml_use_internal_errors(true); $doc->loadHTML($linkhtml); // loads your html $xpath = new DOMXPath($doc); $h2text = $xpath->evaluate("string(//h2/text())"); // $h2text is your text between <h2> and </h2> 
+3
source

You can do this with XPath: untested, may contain errors

 $linkhtml= file_get_contents($link); $dom = new DOMDocument; @$dom->loadHTML($linkhtml); $xpath = new DOMXpath($dom); $elements = $xpath->query("/html/body/h2"); if (!is_null($elements)) { foreach ($elements as $element) { $nodes = $element->childNodes; foreach ($nodes as $node) { echo $node->nodeValue. "\n"; } } } 
+1
source

Source: https://habr.com/ru/post/1411644/


All Articles