Replace URLs in Text with HTML Links

Here is the design: For example, I put a link, for example

http://example.com

in textarea . How do I get PHP to detect its http:// link and then print it as

 print "<a href='http://www.example.com'>http://www.example.com</a>"; 

I remember doing something like this before, however, it was not flawless proof that it continued to break down for complex links.

Another good idea: if you have a link like

http://example.com/test.php?val1=bla&val2blablabla%20bla%20bla.bl

fix so he

 print "<a href='http://example.com/test.php?val1=bla&val2=bla%20bla%20bla.bla'>"; print "http://example.com/test.php"; print "</a>"; 

This is one of the following considerations: stackoverflow can also use this as well: D

Any ideas

+56
url php regex preg-replace linkify
. Jul 27 '09 at 13:20
source share
15 answers

Let's look at the requirements. You have text provided by the user that you want to display using hyperlinks.

  1. The http: // protocol prefix must be optional.
  2. Both domains and IP addresses must be accepted.
  3. Any valid top-level domain must be accepted, such as .aero and .xn - jxalpdlp.
  4. Port numbers must be allowed.
  5. URLs should be allowed in the normal context of the proposal. For example, in "Visit stackoverflow.com." The last period is not part of the URL.
  6. You probably want to allow the URL "https: //" as well as possibly others.
  7. As always, when displaying user-provided text in HTML, you want to prevent cross-site scripting (XSS). In addition, you need ampersands in the URLs to be properly escaped as & amp ..
  8. You may not need IPv6 address support.
  9. Change As noted in the comments, email support is certainly a plus.
  10. Edit : Only plain text input should be supported - HTML input tags should not be counted. (The Bitbucket version supports HTML input.)

Edit : Check out GitHub for the latest version with support for email addresses, authenticated URLs, quoted and bracketed URLs, HTML input, and an updated TLD list.

Here is my take:

 <?php $text = <<<EOD Here are some URLs: stackoverflow.com/questions/1188129/pregreplace-to-detect-html-php Here the answer: http://www.google.com/search?rls=en&q=42&ie=utf-8&oe=utf-8&hl=en. What was the question? A quick look at http://en.wikipedia.org/wiki/URI_scheme#Generic_syntax is helpful. There is no place like 127.0.0.1! Except maybe http://news.bbc.co.uk/1/hi/england/surrey/8168892.stm? Ports: 192.168.0.1:8080, https://example.net:1234/. Beware of Greeks bringing internationalized top-level domains: xn--hxajbheg2az3al.xn--jxalpdlp. And remember.Nobody is perfect. <script>alert('Remember kids: Say no to XSS-attacks! Always HTML escape untrusted input!');</script> EOD; $rexProtocol = '(https?://)?'; $rexDomain = '((?:[-a-zA-Z0-9]{1,63}\.)+[-a-zA-Z0-9]{2,63}|(?:[0-9]{1,3}\.){3}[0-9]{1,3})'; $rexPort = '(:[0-9]{1,5})?'; $rexPath = '(/[!$-/0-9:;=@_\':;!a-zA-Z\x7f-\xff]*?)?'; $rexQuery = '(\?[!$-/0-9:;=@_\':;!a-zA-Z\x7f-\xff]+?)?'; $rexFragment = '(#[!$-/0-9:;=@_\':;!a-zA-Z\x7f-\xff]+?)?'; // Solution 1: function callback($match) { // Prepend http:// if no protocol specified $completeUrl = $match[1] ? $match[0] : "http://{$match[0]}"; return '<a href="' . $completeUrl . '">' . $match[2] . $match[3] . $match[4] . '</a>'; } print "<pre>"; print preg_replace_callback("&\\b$rexProtocol$rexDomain$rexPort$rexPath$rexQuery$rexFragment(?=[?.!,;:\"]?(\s|$))&", 'callback', htmlspecialchars($text)); print "</pre>"; 
  • To avoid & lt; and & characters, I throw all the text through htmlspecialchars before processing. This is not ideal, since html escaping can lead to incorrect URL borders.
  • As "And remember. Nobody is perfect." in a line (which I didn’t remember. Nobody considers it as a URL due to lack of space), further verification of valid top-level domains may be useful.

Edit : The following code fixes the two above problems, but more verbose, as I am more or less preg_replace_callback using preg_match .

 // Solution 2: $validTlds = array_fill_keys(explode(" ", ".aero .asia .biz .cat .com .coop .edu .gov .info .int .jobs .mil .mobi .museum .name .net .org .pro .tel .travel .ac .ad .ae .af .ag .ai .al .am .an .ao .aq .ar .as .at .au .aw .ax .az .ba .bb .bd .be .bf .bg .bh .bi .bj .bm .bn .bo .br .bs .bt .bv .bw .by .bz .ca .cc .cd .cf .cg .ch .ci .ck .cl .cm .cn .co .cr .cu .cv .cx .cy .cz .de .dj .dk .dm .do .dz .ec .ee .eg .er .es .et .eu .fi .fj .fk .fm .fo .fr .ga .gb .gd .ge .gf .gg .gh .gi .gl .gm .gn .gp .gq .gr .gs .gt .gu .gw .gy .hk .hm .hn .hr .ht .hu .id .ie .il .im .in .io .iq .ir .is .it .je .jm .jo .jp .ke .kg .kh .ki .km .kn .kp .kr .kw .ky .kz .la .lb .lc .li .lk .lr .ls .lt .lu .lv .ly .ma .mc .md .me .mg .mh .mk .ml .mm .mn .mo .mp .mq .mr .ms .mt .mu .mv .mw .mx .my .mz .na .nc .ne .nf .ng .ni .nl .no .np .nr .nu .nz .om .pa .pe .pf .pg .ph .pk .pl .pm .pn .pr .ps .pt .pw .py .qa .re .ro .rs .ru .rw .sa .sb .sc .sd .se .sg .sh .si .sj .sk .sl .sm .sn .so .sr .st .su .sv .sy .sz .tc .td .tf .tg .th .tj .tk .tl .tm .tn .to .tp .tr .tt .tv .tw .tz .ua .ug .uk .us .uy .uz .va .vc .ve .vg .vi .vn .vu .wf .ws .ye .yt .yu .za .zm .zw .xn--0zwm56d .xn--11b5bs3a9aj6g .xn--80akhbyknj4f .xn--9t4b11yi5a .xn--deba0ad .xn--g6w251d .xn--hgbk6aj7f53bba .xn--hlcj6aya9esc7a .xn--jxalpdlp .xn--kgbechtv .xn--zckzah .arpa"), true); $position = 0; while (preg_match("{\\b$rexProtocol$rexDomain$rexPort$rexPath$rexQuery$rexFragment(?=[?.!,;:\"]?(\s|$))}", $text, &$match, PREG_OFFSET_CAPTURE, $position)) { list($url, $urlPosition) = $match[0]; // Print the text leading up to the URL. print(htmlspecialchars(substr($text, $position, $urlPosition - $position))); $domain = $match[2][0]; $port = $match[3][0]; $path = $match[4][0]; // Check if the TLD is valid - or that $domain is an IP address. $tld = strtolower(strrchr($domain, '.')); if (preg_match('{\.[0-9]{1,3}}', $tld) || isset($validTlds[$tld])) { // Prepend http:// if no protocol specified $completeUrl = $match[1][0] ? $url : "http://$url"; // Print the hyperlink. printf('<a href="%s">%s</a>', htmlspecialchars($completeUrl), htmlspecialchars("$domain$port$path")); } else { // Not a valid URL. print(htmlspecialchars($url)); } // Continue text parsing from after the URL. $position = $urlPosition + strlen($url); } // Print the remainder of the text. print(htmlspecialchars(substr($text, $position))); 
+118
Jul 27 '09 at 14:55
source share

Here is what I found that checked and verified

 function make_links_blank($text) { return preg_replace( array( '/(?(?=<a[^>]*>.+<\/a>) (?:<a[^>]*>.+<\/a>) | ([^="\']?)((?:https?|ftp|bf2|):\/\/[^<> \n\r]+) )/iex', '/<a([^>]*)target="?[^"\']+"?/i', '/<a([^>]+)>/i', '/(^|\s)(www.[^<> \n\r]+)/iex', '/(([_A-Za-z0-9-]+)(\\.[_A-Za-z0-9-]+)*@([A-Za-z0-9-]+) (\\.[A-Za-z0-9-]+)*)/iex' ), array( "stripslashes((strlen('\\2')>0?'\\1<a href=\"\\2\">\\2</a>\\3':'\\0'))", '<a\\1', '<a\\1 target="_blank">', "stripslashes((strlen('\\2')>0?'\\1<a href=\"http://\\2\">\\2</a>\\3':'\\0'))", "stripslashes((strlen('\\2')>0?'<a href=\"mailto:\\0\">\\0</a>':'\\0'))" ), $text ); } 

This works for me. And it works for emails and urls, sorry to answer my own question. :(

But this is the only thing that works

Here is the link where I found it: http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_21878567.html

Charge because it is an expert exchange.

+14
. Jul 27 '09 at 14:24
source share

You guys are talking about how to move forward and difficult material that is good for some situation, but basically we need a simple careless decision. How about this?

 preg_replace('/(http[s]{0,1}\:\/\/\S{4,})\s{0,}/ims', '<a href="$1" target="_blank">$1</a> ', $text_msg); 

Just give it a try and let me know which crazy URL it does not satisfy.

+12
Apr 20 '15 at 9:57
source share

Here is the code using regular expressions in a function

 <?php //Function definations function MakeUrls($str) { $find=array('`((?:https?|ftp)://\S+[[:alnum:]]/?)`si','`((?<!//)(www\.\S+[[:alnum:]]/?))`si'); $replace=array('<a href="$1" target="_blank">$1</a>', '<a href="http://$1" target="_blank">$1</a>'); return preg_replace($find,$replace,$str); } //Function testing $str="www.cloudlibz.com"; $str=MakeUrls($str); echo $str; ?> 
+4
Mar 21 '14 at 23:51
source share

I use this function, it works for me

 function AutoLinkUrls($str,$popup = FALSE){ if (preg_match_all("#(^|\s|\()((http(s?)://)|(www\.))(\w+[^\s\)\<]+)#i", $str, $matches)){ $pop = ($popup == TRUE) ? " target=\"_blank\" " : ""; for ($i = 0; $i < count($matches['0']); $i++){ $period = ''; if (preg_match("|\.$|", $matches['6'][$i])){ $period = '.'; $matches['6'][$i] = substr($matches['6'][$i], 0, -1); } $str = str_replace($matches['0'][$i], $matches['1'][$i].'<a href="http'. $matches['4'][$i].'://'. $matches['5'][$i]. $matches['6'][$i].'"'.$pop.'>http'. $matches['4'][$i].'://'. $matches['5'][$i]. $matches['6'][$i].'</a>'. $period, $str); }//end for }//end if return $str; }//end AutoLinkUrls 

All loans transferred - http://snipplr.com/view/68586/

Enjoy it!

+2
May 02 '15 at 3:26
source share

This RegEx should match any link except these new 3+ top-level domains ...

  {
   \\ b
   # Match the leading part (proto: // hostname, or just hostname)
   (
     # http: //, or https: // leading part
     (https?): // [- \\ w] + (\\. \\ w [- \\ w] *) +
   |
     # or, try to find a hostname with more specific sub-expression
     (? i: [a-z0-9] (?: [- a-z0-9] * [a-z0-9])? \\.) + # sub domains
     # Now ending .com, etc.  For these, require lowercase
     (? -i: com \\ b
         |  edu \\ b
         |  biz \\ b
         |  gov \\ b
         |  in (?: t | fo) \\ b # .int or .info
         |  mil \\ b
         |  net \\ b
         |  org \\ b
         |  [az] [az] \\. [az] [az] \\ b # two-letter country code
     )
   )

   # Allow an optional port number
   (: \\ d +)?

   # The rest of the URL is optional, and begins with /
   (
     /
     # The rest are heuristics for what seems to work well
     [^.!,?; "\\ '() \ [\] \ {\} \ s \ x7F - \\ xFF] *
     (
       [.!,?] + [^.!,?; "\\ '() \\ [\\] \ {\\} \ s \\ x7F - \\ xFF] +
     ) *
   )?
 } ix

It’s not written by me, I’m not quite sure where I got it from, sorry that I can’t give credit ...

+1
Jul 27 '09 at 13:29
source share

this should get your email addresses:

 $string = "bah bah steve@gmail.com foo"; $match = preg_match('/[^\x00-\x20()<>@,;:\\".[\]\x7f-\xff]+(?:\.[^\x00-\x20()<>@,;:\\".[\]\x7f-\xff]+)*\@[^\x00-\x20()<>@,;:\\".[\]\x7f-\xff]+(?:\.[^\x00-\x20()<>@,;:\\".[\]\x7f-\xff]+)+/', $string, $array); print_r($array); // outputs: Array ( [0] => steve@gmail.com ) 
+1
Jul 27 '09 at 13:41
source share

I know that this answer is accepted and this question is quite old, but it can be useful to other people looking for other implementations.

This is a modified version of the code published by: Angel.King.47 dated July 27, 2009:

 $text = preg_replace( array( '/(^|\s|>)(www.[^<> \n\r]+)/iex', '/(^|\s|>)([_A-Za-z0-9-]+(\\.[A-Za-z]{2,3})?\\.[A-Za-z]{2,4}\\/[^<> \n\r]+)/iex', '/(?(?=<a[^>]*>.+<\/a>)(?:<a[^>]*>.+<\/a>)|([^="\']?)((?:https?):\/\/([^<> \n\r]+)))/iex' ), array( "stripslashes((strlen('\\2')>0?'\\1<a href=\"http://\\2\" target=\"_blank\">\\2</a>&nbsp;\\3':'\\0'))", "stripslashes((strlen('\\2')>0?'\\1<a href=\"http://\\2\" target=\"_blank\">\\2</a>&nbsp;\\4':'\\0'))", "stripslashes((strlen('\\2')>0?'\\1<a href=\"\\2\" target=\"_blank\">\\3</a>&nbsp;':'\\0'))", ), $text ); 

Changes:

  • I deleted rules # 2 and # 3 (I'm not sure which situations are helpful).
  • Remote email parsing since I really don't need this.
  • I added another rule that allows you to recognize URLs in the form: [domain] / * (without www). For example: "example.com/faq/" (multiple tld: domain. {2-3}. {2-4} /)
  • When parsing lines starting with "http: //", it removes them from the link label.
  • Added "target = '_ blank" for all links.
  • Urls can be specified immediately after any (?) Tag. For example: <b> www.example.com </b>

As stated by Søren Løvborg, this function does not remove URLs. I tried his / her class, but it just didn't work as I expected (if you don't trust your users, then try his / her code first).

+1
Apr 05 2018-12-12T00:
source share

As I mentioned in one of the comments above of my VPS that runs php 7, a warning is triggered Warning: preg_replace (): the / e modifier is no longer supported, use preg_replace_callback instead . The buffer after the replacement was empty / false.

I rewrote the code and made some improvements. If you think you should be in the author’s section, feel free to edit the comment above the make_links_blank function name. Am I intentionally not using closing php? >, So as not to insert spaces in the output file.

 <?php class App_Updater_String_Util { public static function get_default_link_attribs( $regex_matches = [] ) { $t = ' target="_blank" '; return $t; } /** * App_Updater_String_Util::set_protocol(); * @param string $link * @return string */ public static function set_protocol( $link ) { if ( ! preg_match( '#^https?#si', $link ) ) { $link = 'http://' . $link; } return $link; } /** * Goes through text and makes whatever text that look like a link an html link * which opens in a new tab/window (by adding target attribute). * * Usage: App_Updater_String_Util::make_links_blank( $text ); * * @param str $text * @return str * @see http://stackoverflow.com/questions/1188129/replace-urls-in-text-with-html-links * @author Angel.King.47 | http://dashee.co.uk * @author Svetoslav Marinov (Slavi) | http://orbisius.com */ public static function make_links_blank( $text ) { $patterns = [ '#(?(?=<a[^>]*>.+?<\/a>) (?:<a[^>]*>.+<\/a>) | ([^="\']?)((?:https?|ftp):\/\/[^<> \n\r]+) )#six' => function ( $matches ) { $r1 = empty( $matches[1] ) ? '' : $matches[1]; $r2 = empty( $matches[2] ) ? '' : $matches[2]; $r3 = empty( $matches[3] ) ? '' : $matches[3]; $r2 = empty( $r2 ) ? '' : App_Updater_String_Util::set_protocol( $r2 ); $res = ! empty( $r2 ) ? "$r1<a href=\"$r2\">$r2</a>$r3" : $matches[0]; $res = stripslashes( $res ); return $res; }, '#(^|\s)((?:https?://|www\.|https?://www\.)[^<>\ \n\r]+)#six' => function ( $matches ) { $r1 = empty( $matches[1] ) ? '' : $matches[1]; $r2 = empty( $matches[2] ) ? '' : $matches[2]; $r3 = empty( $matches[3] ) ? '' : $matches[3]; $r2 = ! empty( $r2 ) ? App_Updater_String_Util::set_protocol( $r2 ) : ''; $res = ! empty( $r2 ) ? "$r1<a href=\"$r2\">$r2</a>$r3" : $matches[0]; $res = stripslashes( $res ); return $res; }, // Remove any target attribs (if any) '#<a([^>]*)target="?[^"\']+"?#si' => '<a\\1', // Put the target attrib '#<a([^>]+)>#si' => '<a\\1 target="_blank">', // Make emails clickable Mailto links '/(([\w\-]+)(\\.[\w\-]+)*@([\w\-]+) (\\.[\w\-]+)*)/six' => function ( $matches ) { $r = $matches[0]; $res = ! empty( $r ) ? "<a href=\"mailto:$r\">$r</a>" : $r; $res = stripslashes( $res ); return $res; }, ]; foreach ( $patterns as $regex => $callback_or_replace ) { if ( is_callable( $callback_or_replace ) ) { $text = preg_replace_callback( $regex, $callback_or_replace, $text ); } else { $text = preg_replace( $regex, $callback_or_replace, $text ); } } return $text; } } 
+1
Oct. 14 '16 at 9:26
source share

Something along the lines of:

 <?php if(preg_match('@^http://(.*)\s|$@g', $textarea_url, $matches)) { echo '<a href=http://", $matches[1], '">', $matches[1], '</a>'; } ?> 
0
Jul 27 '09 at 13:30
source share

This class modifies the URLs in the text and stores the home URL as it is. I hope this helps and saves you time. Enjoy it.

 class RegClass { function preg_callback_url($matches) { //var_dump($matches); //Get the matched URL text <a>text</a> $text = $matches[2]; //Get the matched URL link <a href ="http://www.test.com">text</a> $url = $matches[1]; if($url=='href ="http://www.test.com"'){ //replace all a tag as it is return '<a href='.$url.' rel="nofollow"> '.$text.' </a>'; }else{ //replace all a tag to text return " $text " ; } } function ParseText($text){ $text = preg_replace( "/www\./", "http://www.", $text ); $regex ="/http:\/\/http:\/\/www\./" $text = preg_replace( $regex, "http://www.", $text ); $regex2 = "/https:\/\/http:\/\/www\./"; $text = preg_replace( $regex2, "https://www.", $text ); return preg_replace_callback('/<a\s(.+?)>(.+?)<\/a>/is', array( &$this, 'preg_callback_url'), $text); } } $regexp = new RegClass(); echo $regexp->ParseText($text); 
0
May 12 '13 at 15:48
source share

If you want to trust IANA, you can use your current list of TLD supported TLDs, for example:

  $validTLDs = explode("\n", file_get_contents('http://data.iana.org/TLD/tlds-alpha-by-domain.txt')); //get the official list of valid tlds array_shift($validTLDs); //throw away first line containing meta data array_pop($validTLDs); //throw away last element which is empty 

Makes the solution of Søren Løvborg No. 2 a little less detailed and eliminates the need to update the list, now new tlds are thrown so carelessly;)

0
Mar 05 '14 at 4:53
source share

This worked for me (turned one of the answers into a PHP function)

 function make_urls_from_text ($text){ return preg_replace('/(http[s]{0,1}\:\/\/\S{4,})\s{0,}/ims', '<a href="$1" target="_blank">$1 </a>', $text); } 
0
Jul 21 '15 at 19:02
source share

This should get your twitter without touching the email /(?<=^|(?<=[^a-zA-Z0-9-.†))@([A-Za-z†+►A-Za- z0-9] +) / i

-one
Aug 14 '13 at 10:12
source share

When matching the full specification of the URL is difficult, here is a regular expression that usually does a good job:

 ([\w-]+(\.[\w-]+)*@([a-z0-9-]+(\.[a-z0-9-]+)*?\.[az]{2,6}|(\d{1,3}\.){3}\d{1,3})(:\d{4})?) 

To use this in preg_replace, you need to avoid it. In this way:

 $pattern = "/([\\w-]+(\\.[\\w-]+)*@([a-z0-9-]+(\\.[a-z0-9-]+)*?\\.[az]{2,6}|(\\d{1,3}\\.){3}\\d{1,3})(:\\d{4})?)/"; $replaced_texttext = preg_replace($pattern, '<a href="$0" title="$0">$0</a>', $text); 
-2
Jul 27 '09 at 13:30
source share



All Articles