Associate links in the <a> tag with a regular expression

I need to wrap all links in the text with an "a" tag with a regular expression in php, except for those that have already been damaged

So I have the text:

Some text with html here
http://www.somelink.html
http://www.somelink.com/view/?id=95
<a href="http://anotherlink.html">http://anotherlink.html</a>
<a href="http://anotherlink.html">Title</a>

What I need to get:

Some text with html here
<a href="http://www.somelink.html">http://www.somelink.html</a> <a href="http://www.somelink.com/view/?id=2495">http://www.somelink.com/view/?id=95</a>
<a href="http://anotherlink.html">http://anotherlink.html</a>
<a href="http://anotherlink.html">Title</a> >

I can match the links with this expression:

(?:(?:https?|ftp):\/\/|www.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|]

but it also matches those already in the tags

+5
source share
3 answers

For reliability, I would split into tags <a>(including child content) plus other tags (excluding child content), for example:

$bits = preg_split('/(<a(?:\s+[^>]*)?>.*?<\/a>|<[a-z][^>]*>)/is', $content, null, PREG_SPLIT_DELIM_CAPTURE);

$reconstructed = '';

foreach ($bits as $bit) {
  if (strpos($bit, '<') !== 0) {//not inside an <a> or within < and > so check for urls
    $bit = link_urls($bit);
  }
  $reconstructed .= $bit;
}
+2
source

You would use a negative lookbehind . Syntax:

(?<!text)

So in your case it will be:

(?<!\<a)

Or something close to the above.

+3

( perl). .

use strict;
use warnings;

my $html = '
  http://Top.html

  Some text with more html here
  <a href="http://www.somelink.html">
        http://www.somelink.html
  </a>

  <a href="http://www.somelink.com/view/?id=2495">
       http://www.somelink.com/view/?id=95
  </a>

  <a href="http://anotherlink.html">
       http://anotherlink.html
  </a>

  http://andone.html
  http://andtwo.html

  <a href="http://anthisisotherlink.html"><mn>
       Title
     http://this  <br>
       <b href="http://erlink.html">
     asdf
  </a> 
';

{
 no warnings;
 $html =~ 

 # Regex (global relace) ..
  s{(?is)
      (<   (?:DOCTYPE.*?|--.*?--)
         | script\s[^>]*>.*?</script\s*
         | style\s[^>]*>.*?</style\s*
         | a\s[^>]*>.*?</a\s*
         | (?:/?\w+\s*/?|(?:\w+\s+(?:".*?"|'.*?'|[^>]*?)+\s*/?))
        >
      )
    | ( (?:
         (?!(?:(?:https?|ftp)://|www.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|])
         [^<]
        )*?
      )
    | ( (?:(?:https?|ftp)://|www.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|] )
  }

 # Replacement (would be a callback function in php) ..
  {
     defined $3 ? "<a href=\"$3\">$3</a>" : "$1$2"
  }xeg;
}

print $html,"\n";

  <a href="http://Top.html">http://Top.html</a>

  Some text with more html here
  <a href="http://www.somelink.html">
        <a href="http://www.somelink.html">http://www.somelink.html</a>
  </a>

  <a href="http://www.somelink.com/view/?id=2495">
       http://www.somelink.com/view/?id=95
  </a>

  <a href="http://anotherlink.html">
       http://anotherlink.html
  </a>

  <a href="http://andone.html">http://andone.html</a>
  <a href="http://andtwo.html">http://andtwo.html</a>

  <a href="http://anthisisotherlink.html"><mn>
       Title
     http://this  <br>
       <b href="http://erlink.html">
     asdf
  </a>
0

All Articles