Automatic regex binding

I use the PHP function to automatically turn URLs in a text string into an actual link that people can click on. This seems to work in most cases, however I found some cases where this is not the case.

I don’t understand regular expressions at all, so I was hoping that someone could help me with this.

Here is the template that I am currently using:

$pattern = "/(((http[s]?:\/\/)|(www\.))(([az][-a-z0-9]+\.)?[az][-a-z0-9]+\.[az]+(\.[az]{2,2})?)\/?[a-z0-9.,_\/~#&=;%+?-]+[a-z0-9\/#=?]{1,1})/is"; 

However, here are some links that I found that this template does not match:

  • www.oakvilletransit.ca - Not sure, but assuming it doesn't match because of a two-letter country code
  • www.grt.ca - Another with the .ca domain that does not work.
  • Several other .ca addresses
  • freepublictransports.com - Addresses without www. or http: // in front of them. I would like them to work too.
  • www.222tips.com - Assuming it doesn't match due to numbers at the beginning of the address.

Does anyone know how I can modify this regex pattern in the same way as these cases?

EDIT . It should also match URLs that may have a period at the end. If the URL is the last part of the sentence, there may be a period at the end that should not be included in the actual link. This template currently takes this into account as well.

EDIT 2 . I use the template as follows:

 $pattern = "/((http|https):\/\/)?([a-z0-9-]+\.)?[az][a-z0-9-]+(\.[az]{2,6}){1,3}(\/[a-z0-9.,_\/~#&=;%+?-]*)?/is"; $string = preg_replace($pattern, " <a target='_blank' href='$1'>$1</a>", $string); // fix URLs without protocols $string = preg_replace("/href='www/", "href='http://www", $string); return $string; 
+8
url php regex
source share
2 answers

The following regex will match the urls:

  • (optional) Using http:// or https://
  • (optional) with a subdomain ( www.example.com , help.example.com , etc.)
  • With 1-3 domain extensions, each of which should be 2-6 characters ( www.example.com.gu , www.example.com.au.museum , etc.)
  • (optional) with a slash at the end
  • (optional) With valid characters after the slash

/i at the end makes the case insensitive .

/((http|https):\/\/)?([a-z0-9-]+\.)?[a-z0-9-]+(\.[az]{2,6}){1,3}(\/[a-z0-9.,_\/~#&=;%+?-]*)?/is

Edit: This will not correspond to any β€œhanging” periods at the end (for example, the end of a sentence), since it is not part of the URL and should not be included in the href attribute of your link.

Edit 2: In the first preg_replace() change $1 to $0 . This will insert the entire matched string instead of one part of it.

Edit 3: (Update 2). Here you can better check http:// or https:// at the beginning:

 preg_replace("/href='[^h][^t][^t][^p][^s]?[^:]/", "/href='http:\/\/", $string); 
+5
source share

I had problems with all the above examples.

Here is what works:

 function autolink($string){ $string= preg_replace("#http://([\S]+?)#Uis", '<a href="http://\\1">\\1</a>', $string); return $string; } 
+3
source share

All Articles