Retrieving elements with xpath and DOMDocument

I have a list of ads in the html code below. I need a PHP loop to get the following elements for each declaration:

  • Ad URL (href attribute of <a> tag)
  • Image URL (src attribute of the <img> )
  • ad title (html content of the <div class="title"> )
 <div class="ads"> <a href="http://path/to/ad/1"> <div class="ad"> <div class="image"> <div class="wrapper"> <img src="http://path/to/ad/1/image.jpg"> </div> </div> <div class="detail"> <div class="title">Ad #1</div> </div> </div> </a> <a href="http://path/to/ad/2"> <div class="ad"> <div class="image"> <div class="wrapper"> <img src="http://path/to/ad/2/image.jpg"> </div> </div> <div class="detail"> <div class="title">Ad #2</div> </div> </div> </a> </div> 

I managed to get the URL of the ad with the PHP code below.

 $d = new DOMDocument(); $d->loadHTML($ads); // the variable $ads contains the HTML code above $xpath = new DOMXPath($d); $ls_ads = $xpath->query('//a'); foreach ($ls_ads as $ad) { $ad_url = $ad->getAttribute('href'); print("AD URL : $ad_url"); } 

But I was not able to get the other two elements (image URL and name). Any idea?

+6
source share
2 answers

I managed to get what I need with this code (based on the Hue Wu code):

 $d = new DOMDocument(); $d->loadHTML($ads); // the variable $ads contains the HTML code above $xpath = new DOMXPath($d); $ls_ads = $xpath->query('//a'); foreach ($ls_ads as $ad) { // get ad url $ad_url = $ad->getAttribute('href'); // set current ad object as new DOMDocument object so we can parse it $ad_Doc = new DOMDocument(); $cloned = $ad->cloneNode(TRUE); $ad_Doc->appendChild($ad_Doc->importNode($cloned, True)); $xpath = new DOMXPath($ad_Doc); // get ad title $ad_title_tag = $xpath->query("//div[@class='title']"); $ad_title = trim($ad_title_tag->item(0)->nodeValue); // get ad image $ad_image_tag = $xpath->query("//img/@src"); $ad_image = $ad_image_tag->item(0)->nodeValue; } 
+10
source

for other elements you just do the same:

 foreach ($ls_ads as $ad) { $ad_url = $ad->getAttribute('href'); print("AD URL : $ad_url"); $ad_Doc = new DOMDocument(); $ad_Doc->documentElement->appendChild($ad_Doc->importNode($ad)); $xpath = new DOMXPath($ad_Doc); $img_src = $xpath->query("//img[@src]"); $title = $xpath->query("//div[@class='title']"); } 
+10
source

Source: https://habr.com/ru/post/926011/


All Articles