Regexp to find everything between the <a> and </a> tags
I am trying to find a way to list everything between tags <a>and </a>. Therefore, I have a list of links, and I want to get the names of the links (not where the links go, but because they are called on the page). It would be really helpful for me.
I currently have this:
$lines = preg_split("/\r?\n|\r/", $content); // content is the given page
foreach ($lines as $val) {
if (preg_match("/(<A(.*)>)(<\/A>)/", $val, $alink)) {
$newurl = $alink[1];
// put in array of found links
$links[$index] = $newurl;
$index++;
$is_href = true;
}
}
+5
9 answers
The standard rejection expression applies: HTML parsing with regular expressions is not ideal. Success depends on the correct input at the character level. If you cannot guarantee this, the regular expression will not be able to do the right thing at some point.
Having said that:
<a\b[^>]*>(.*?)</a> // match group one will contain the link text
+15
, , - preg_match_all.
:
$pattern = '#<a[^>]*>([^<]*)<\/a>#';
$subject = '<a href="#">Link 1</a> <a href="#">Link 3</a> <a href="#">Link 3</a>';
preg_match_all($pattern, $subject, $matches);
print_r($matches[1]);
$pattern = '#<a[^>]*>(.*?)<\/a>#';
$subject = '<a href="#">2 > 1</a> <a href="#">1 < 2</a>';
preg_match_all($pattern, $subject, $matches);
:
Array (
[0] => Link 1
[1] => Link 3
[2] => Link 3
)
0
'<a.*?>(.*?)</a>'
['sign up', 'log in', 'careers 2.0']
:
<span id="hlinks-nav"><a href="/users/login?returnurl=%2fquestions%2f343115%2fregexp-for-finding-everything-between-a-and-a-tags">sign up</a><span class="lsep">|</span><a href="/users/login?returnurl=%2fquestions%2f343115%2fregexp-for-finding-everything-between-a-and-a-tags">log in</a><span class="lsep">|</span><a href="http://careers.stackoverflow.com">careers 2.0</a><span class="lsep">|</span></span>
0
, ["'] i s , , :
<a\s.*?['"]\s*>((?:(?!<\/a>).)*)<\/a>
Test
$re = '/<a\s.*?[\'"]\s*>((?:(?!<\/a>).)*)<\/a>/si';
$str = '<a href="https://google.com"
title="some title"
data-key="{\'key\':\'adf0a8dfq<>*1$4%\' >
some context in here <>
some context in there <>
</a>
<A href="https://google.com"
title="some title"
data-key="{\'key\':\'adf0a8dfq<>*1$4%\'>
some context in here
some context in there
</A>';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
// , regex101.com. , , .
RegEx Circuit
jex.im :
0
, , , HTML (OP , HTML - - ). ).
, DOMDocument (, , DOMXPath), , , , , . , ( ).
$html = <<<HTML
<a href="#">hello</a> <abbr href="#">FYI</abbr> <a title="goodbye">later</a>
<a href=https://example.com>no quoted attributes</a>
<A href="https://example.com"
title="some title"
data-key="{\'key\':\'adf0a8dfq<>*1$4%\'">a link with data attribute</A>
and
this is <a title="hello">not a hyperlink</a> but simply an anchor tag
HTML;
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$linkText = [];
foreach ($xpath->evaluate("//a[@href]") as $node) {
$linkText[] = $node->nodeValue;
}
var_export($linkText);
:
array (
0 => 'hello',
1 => 'no quoted attributes',
2 => 'a link with data attribute',
)
href:
:
$doc = new DOMDocument();
$doc->loadHTML($html);
$aTags = [];
foreach ($doc->getElementsByTagName('a') as $a) {
$aTags[] = $a->nodeValue;
}
var_export($aTags);
:
array (
0 => 'hello',
1 => 'later',
2 => 'no quoted attributes',
3 => 'a link with data attribute',
4 => 'not a hyperlink',
)
0
