How to find a URL from content in PHP?

you just need preg_match, which will find "c.aspx" (without quotes) in the content, if it does, it will return the entire URL. As an example

$content = '<div>[4]<a href="/m/c.aspx?mt=01_9310ba801f1255e02e411d8a7ed53ef95235165ee4fb0226f9644d439c11039f%7c8acc31aea5ad3998&amp;n=783622212">New message</a><br/>'; 

now it should preg_match "c.aspx" from $ content and give the result as

 "/m/c.aspx?mt=01_9310ba801f1255e02e411d8a7ed53ef95235165ee4fb0226f9644d439c11039f%7c8acc31aea5ad3998&amp;n=783622212" 

$ Content should have more links except "c.aspx". I do not want them. I want the whole url to have "c.aspx".

Please let me know how I can do this.

+1
source share
2 answers

You use the DOM to parse HTML, not a regular expression. You can use a regular expression to parse an attribute value.

Edit: An updated example, so it checks for c.aspx.

 $content = '<div>[4]<a href="/m/c.aspx?mt=01_9310ba801f1255e02e411d8a7ed53ef95235165ee4fb0226f9644d439c11039f%7c8acc31aea5ad3998&amp;n=783622212">New message</a> <a href="#bar">foo</a> <br/>'; $dom = new DOMDocument(); $dom->loadHTML($content); $anchors = $dom->getElementsByTagName('a'); if ( count($anchors->length) > 0 ) { foreach ( $anchors as $anchor ) { if ( $anchor->hasAttribute('href') ) { $link = $anchor->getAttribute('href'); if ( strpos( $link, 'c.aspx') ) { echo $link; } } } } 
+7
source

If you want to find any quote with c.aspx in it:

 /"[^"]*c\.aspx[^"]*"|'[^']*c\.aspx[^']*'/ 

But in fact, for parsing most HTML, you will be better off with some kind of DOM parser so that you can be sure that you agree, this is really href.

+1
source

All Articles