Extract link attributes from HTML string

What is the best way to extract HTML from $ var?

$ var example

$var = "<a href="http://stackoverflow.com/">Stack Overflow</a>" 

I want to

 $var2 = "http://stackoverflow.com/" 

example: preg_match ();

what else?

+4
source share
5 answers

Instead of handling a long complex regex, do it in steps

 $str = '<a href="http://stackoverflow.com/"> Stack Overflow</a>'; $str = preg_replace("/.*<a\s+href=\"/","",$str); print preg_replace("/\">.*/","",$str); 

one way to not regex using explode

 $str = '<a href="http://stackoverflow.com/"> Stack Overflow</a>'; $s = explode('href="',$str); $t = explode('">',$s[1]); print $t[0]; 
+5
source

If this is a valid HTML string that you have, then the DOMDocument module loadHTML () will work, and you can easily navigate your structure. This is a good way to do this if you have a lot of HTML to work with.

 $doc = new DOMDocument(); $doc->loadHTML('<a href="http://stackoverflow.com/">Stack Overflow</a>'); $anchors = $doc->getElementsByTagName('a'); foreach($anchors as $node) { echo $node->textContent; if ($node->hasAttributes()) { foreach($node->attributes as $a) { echo ' | '.$a->name.': '.$a->value; } } } 

produces the following:

 Qaru | href: http://stackoverflow.com/ 
+4
source

strip_tags() removes HTML from the value of a variable. The second parameter is useful if you want to make exceptions and leave specific tags, for example p aragraph tag.

 $text = '<p>Paragraph.</p> <!-- boo --> <a href="#">Other text</a>'; echo strip_tags($text); // Paragraph. Other text echo strip_tags($text, '<p><a>'); // <p>Paragraph.</p> <a href="#">Other text</a> 

phpQuery

If you want to stay away from regular expressions, you can use phpQuery to process the value, and then use the jQuery style of selectors and methods to get your value:

 // Bring in phpQuery require("phpQuery-onefile.php"); // Load up our HTML phpQuery::newDocumentHTML("<a href='http://sampsonresume.com/'>Homepage</a>"); // Print the HREF attribute of the first Anchor print pq("a:first")->attr("href"); // http://sampsonresume.com/ 

Regex

To find the url you can use the following:

 $var = "<a href='http://sampsonresume.com/'>Homepage</a>"; preg_match("(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)",$var,$match); print $match[0]; // http://sampsonresume.com/ 
+1
source

Use the following regular expression:

 \b((?:[az][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.])(?:[^\s()<>]+|\([^\s()<>]+\))+(?:\([^\s()<>]+\)|[^`!()\[\]{};:'".,<>?ยซยป""''\s])) 
0
source
 <?php preg_match_All("#<a\s[^>]*href\s*=\s*[\'\"]??\s*?(?'path'[^\'\"\s]+?)[\'\"\s]{1}[^>]*>(?'name'[^>]*)<#simU", $html, $hrefs, PREG_SET_ORDER); foreach ($hrefs AS $urls){ print $urls['path']."<br>"; } ?> 
0
source

All Articles