Regexp to find everything between the <a> and </a> tags

I am trying to find a way to list everything between tags <a>and </a>. Therefore, I have a list of links, and I want to get the names of the links (not where the links go, but because they are called on the page). It would be really helpful for me.

I currently have this:

$lines = preg_split("/\r?\n|\r/", $content);  // content is the given page
foreach ($lines as $val) {
  if (preg_match("/(<A(.*)>)(<\/A>)/", $val, $alink)) {     
    $newurl = $alink[1];

    // put in array of found links
    $links[$index] = $newurl;
    $index++;
    $is_href = true;
  }
}
+5
source share
9 answers

The standard rejection expression applies: HTML parsing with regular expressions is not ideal. Success depends on the correct input at the character level. If you cannot guarantee this, the regular expression will not be able to do the right thing at some point.

Having said that:

<a\b[^>]*>(.*?)</a>   // match group one will contain the link text
+15

, .

HTML.

I Googled PHP- PHP .

, XHTML, PHP- XML PHP.

+3
<a\s*(.*)\>(.*)</a>

<a href="http://www.stackoverflow.com">Go to stackoverflow.com</a>

$1 = href= "www.stackoverflow.com"

$2 = stackoverflow.com

, ,

+2

Regex, , :)

. , , .

HTML-

< TAG\b [^ > ] > (.?) , RegexBuddy HTML. , , . , , , . , , , onetwoone.

< ([A-Z] [A-Z0-9])\b [^ > ] > (. *?) RegexBuddy HTML. . \1 . , , . , .

: : "" . .

, :)

!

0

.. , perl regexp,

m!<a .*?>(.*?)</a>!i

, .

:

  • , .
  • .

, , ( ), .

0

, , - preg_match_all.

:

$pattern = '#<a[^>]*>([^<]*)<\/a>#';
$subject = '<a href="#">Link 1</a> <a href="#">Link 3</a> <a href="#">Link 3</a>';
preg_match_all($pattern, $subject, $matches);
print_r($matches[1]);

$pattern = '#<a[^>]*>(.*?)<\/a>#';
$subject = '<a href="#">2 > 1</a> <a href="#">1 < 2</a>';
preg_match_all($pattern, $subject, $matches);

:

Array (
 [0] => Link 1
 [1] => Link 3
 [2] => Link 3
)
0

'<a.*?>(.*?)</a>'

['sign up', 'log in', 'careers 2.0']

:

<span id="hlinks-nav"><a href="/users/login?returnurl=%2fquestions%2f343115%2fregexp-for-finding-everything-between-a-and-a-tags">sign up</a><span class="lsep">|</span><a href="/users/login?returnurl=%2fquestions%2f343115%2fregexp-for-finding-everything-between-a-and-a-tags">log in</a><span class="lsep">|</span><a href="http://careers.stackoverflow.com">careers 2.0</a><span class="lsep">|</span></span>
0

, ["'] i s , , :

<a\s.*?['"]\s*>((?:(?!<\/a>).)*)<\/a>

Test

$re = '/<a\s.*?[\'"]\s*>((?:(?!<\/a>).)*)<\/a>/si';
$str = '<a href="https://google.com"
title="some title"
data-key="{\'key\':\'adf0a8dfq<>*1$4%\' >

some context in here <>

some context in there <>

</a>

<A href="https://google.com"
title="some title"
data-key="{\'key\':\'adf0a8dfq<>*1$4%\'>

some context in here

some context in there

</A>';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

var_dump($matches);

// , regex101.com. , , .


RegEx Circuit

jex.im :

enter image description here

0

, , , HTML (OP , HTML - - ). ).

, DOMDocument (, , DOMXPath), , , , , . , ( ).

: ()

$html = <<<HTML
<a href="#">hello</a> <abbr href="#">FYI</abbr> <a title="goodbye">later</a>
<a href=https://example.com>no quoted attributes</a>
<A href="https://example.com"
title="some title"
data-key="{\'key\':\'adf0a8dfq<>*1$4%\'">a link with data attribute</A>
and
this is <a title="hello">not a hyperlink</a> but simply an anchor tag
HTML;

$dom = new DOMDocument; 
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$linkText = [];
foreach ($xpath->evaluate("//a[@href]") as $node) {
    $linkText[] = $node->nodeValue;
}
var_export($linkText);

:

array (
  0 => 'hello',
  1 => 'no quoted attributes',
  2 => 'a link with data attribute',
)    

href:

:

$doc = new DOMDocument();
$doc->loadHTML($html);
$aTags = [];
foreach ($doc->getElementsByTagName('a') as $a) {
    $aTags[] = $a->nodeValue;
}
var_export($aTags);

:

array (
  0 => 'hello',
  1 => 'later',
  2 => 'no quoted attributes',
  3 => 'a link with data attribute',
  4 => 'not a hyperlink',
)
0
source

All Articles