Regex to find all urls and headers

I would like to extract all the urls and headings from a paragraph of text.

Les <a href="" class="c_link-blue">résultats du sondage</a> sur les remakes et suites souhaités sont <a href="" class="c_link-blue">dans le blog</a>. 

I can get all href thanks to the following regex, but I don’t know how to get an extra name between the <a></a> tags?

 preg_match_all('/<a.*href="?([^" ]*)" /iU', $v['message'], $urls); 

It would be best to get such an associative array

 [0] => Array ( [title] => XXX [link] => ) [1] => Array ( [title] => XXX [link] => ) 

thanks for the help

source share
5 answers

If you still insist on using regular expressions to solve this problem, you can parse it with this regular expression:


Please note that it does not use the U modifier as you did.

Update: To accept single Qout as well as double quotes, you can use the following pattern:


As mentioned in the comments, do not use a regular expression, but a DOM parser.
For instance.

 <?php $doc = new DOMDocument; $doc->loadhtml( getExampleData() ); $xpath = new DOMXPath($doc); foreach( $xpath->query('/html/body/p[@id="abc"]//a') as $node ) { echo $node->getAttribute('href'), ' - ' , $node->textContent, "\n"; } function getExampleData() { return '<html><head><title>...</title></head><body> <p> not <a href="wrong">this one</a> but .... </p> <p id="abc"> Les <a href="" class="c_link-blue">résultats du sondage</a> sur les remakes et suites souhaités sont <a href="" class="c_link-blue">dans le blog</a>. </p> </body></html>'; } 

see and


You should not use RegEx for this. You must use the XML / DOM parser. I did it fast using DOMDocument .

 $links = array(); $dom = new DOMDocument; @$dom->loadHTML('Les <a href="" class="c_link-blue">résultats du sondage</a> sur les remakes et suites souhaités sont <a href="" class="c_link-blue">dans le blog</a>.'); $xPath = new DOMXPath($dom); $a = $xPath->query('//a'); for($i=0; $i<$a->length; $i++){ $e = $a->item($i); $links[] = array( 'title' => $e->nodeValue, 'link' => $e->getAttribute('href') ); } print_r($links); 


 preg_match_all("/<a[^>]*href=\"([^\"]*)[^>]*>([^<]*)</a>/", $v['message'], $urls, PREG_SET_ORDER) 

should work to give you what you want. This is not an associated array, but it must be a nested array in the desired format.


For people offering to use the DOM, it would be nice to use the DOM. But of course, you will not use the FULL DOM parser to parse a couple of URLs / titles!

Just use a regex like this:

 /<a.*href="([^" ]*)".*>(.*)<\/a>/iU 

All Articles