Extract link attributes from HTML string

Question

Extract link attributes from HTML string

What is the best way to extract HTML from $ var?

$ var example

$var = "<a href="http://stackoverflow.com/">Stack Overflow</a>"

I want to

 $var2 = "http://stackoverflow.com/"

example: preg_match ();

what else?

+4

php

alexus Jan 15 '10 at 1:58

source share

5 answers

If this is a valid HTML string that you have, then the DOMDocument module loadHTML () will work, and you can easily navigate your structure. This is a good way to do this if you have a lot of HTML to work with.

 $doc = new DOMDocument(); $doc->loadHTML('<a href="http://stackoverflow.com/">Stack Overflow</a>'); $anchors = $doc->getElementsByTagName('a'); foreach($anchors as $node) { echo $node->textContent; if ($node->hasAttributes()) { foreach($node->attributes as $a) { echo ' | '.$a->name.': '.$a->value; } } }

produces the following:

 Qaru | href: http://stackoverflow.com/

+4

zombat Jan 15 '10 at 2:53

source share

strip_tags() removes HTML from the value of a variable. The second parameter is useful if you want to make exceptions and leave specific tags, for example p aragraph tag.

 $text = '<p>Paragraph.</p> <!-- boo --> <a href="#">Other text</a>'; echo strip_tags($text); // Paragraph. Other text echo strip_tags($text, '<p><a>'); // <p>Paragraph.</p> <a href="#">Other text</a>

phpQuery

If you want to stay away from regular expressions, you can use phpQuery to process the value, and then use the jQuery style of selectors and methods to get your value:

 // Bring in phpQuery require("phpQuery-onefile.php"); // Load up our HTML phpQuery::newDocumentHTML("<a href='http://sampsonresume.com/'>Homepage</a>"); // Print the HREF attribute of the first Anchor print pq("a:first")->attr("href"); // http://sampsonresume.com/

Regex

To find the url you can use the following:

 $var = "<a href='http://sampsonresume.com/'>Homepage</a>"; preg_match("(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)",$var,$match); print $match[0]; // http://sampsonresume.com/

+1

Sampson Jan 15 '10 at 2:01

source share

Use the following regular expression:

 \b((?:[az][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.])(?:[^\s()<>]+|\([^\s()<>]+\))+(?:\([^\s()<>]+\)|[^`!()\[\]{};:'".,<>?«»""''\s]))

0

Alix axel Jan 15 '10 at 2:17

source share

 <?php preg_match_All("#<a\s[^>]*href\s*=\s*[\'\"]??\s*?(?'path'[^\'\"\s]+?)[\'\"\s]{1}[^>]*>(?'name'[^>]*)<#simU", $html, $hrefs, PREG_SET_ORDER); foreach ($hrefs AS $urls){ print $urls['path']."<br>"; } ?>

0

Andrew W Apr 11 '15 at 12:00

source share

ghostdog74 · Accepted Answer · 2010-01-15T02:18:23+0000

Instead of handling a long complex regex, do it in steps

 $str = '<a href="http://stackoverflow.com/"> Stack Overflow</a>'; $str = preg_replace("/.*<a\s+href=\"/","",$str); print preg_replace("/\">.*/","",$str);

one way to not regex using explode

 $str = '<a href="http://stackoverflow.com/"> Stack Overflow</a>'; $s = explode('href="',$str); $t = explode('">',$s[1]); print $t[0];

Extract link attributes from HTML string

phpQuery

Regex

More articles: