(PHP5) Retrieve the title tag and RSS feed address from HTML using PHP DOM or Regex

I would like to get the title tag and the RSS feed address (if any) from the given URL, but the methods that I have used so far do not work at all. I managed to get the header tag using preg_match and regex, but I can not find anywhere with the RSS feed address.

($ webContent stores the HTML site)

I copied my code below for reference ...

`// Get the title tag preg_match ('@ (*) @ I.', $ WebContent, $ titleTagArray);

// If the title tag has been found, assign it to a variable if($titleTagArray && $titleTagArray[3]) $webTitle = $titleTagArray[3]; // Get the RSS or Atom feed address preg_match('@<link(.*)rel="alternate"(.*)href="(.*)"(.*)type="application/rss+xml"\s/>@i',$webContent,$feedAddrArray); // If the feed address has been found, assign it to a variable if($feedAddrArray && $feedAddrArray[2]) $webFeedAddr = $feedAddrArray[2];` 

I read here that using regex isn't the best way to do this? Hope someone can give me a hand with this :-)

Thanks.

+4
source share
2 answers

One approach

 $dom = new DOMDocument; // init new DOMDocument $dom->loadHTML($html); // load HTML into it $xpath = new DOMXPath($dom); // create a new XPath $nodes = $xpath->query('//title'); // Find all title elements in document foreach($nodes as $node) { // Iterate over found elements echo $node->nodeValue; // output title text } 

To get the href attribute of all link tags of type "application / rss + xml", you should use this XPath:

 $xpath->query('//link[@type="application/rss+xml"]/@href'); 
+5
source

RegExp is far from a better solution;) For example, use a feed reader, the Zend_Feed class for the zend environment.

0
source

Source: https://habr.com/ru/post/1312981/


All Articles