Get all urls in string with php

I am trying to figure out a way to get an array of URLs from a string of text. The text will be somewhat formatted as follows:

Some random texts here

http://techcrunch.com/2012/07/20/kickstarter-flashr-wants-to-make-the-iphones-bezel-a-massive-notification-light/?grcc=88888Z0ZwdgtZ0Z0Z0Z0Z0&grcc2=835637c33f965e6cdd34c87219233711~1342828462249~fca4fa8af1286d8a77f26033fdeed202~510f37324b14c50a5e9121f955fac3fa ~ 1342747216490 ~ 0 ~ 0 ~ 0 ~ 0 ~ 0 ~ 0 ~ 0 ~ 0 ~ 7 ~ 3 ~

http://techcrunch.com/2012/07/20/last-day-to-purchase-extra-early-bird-tickets-for-disrupt-sf/

Obviously, these links can be any (and there can be many links, these are the ones I am testing now. If I use a simple URL, for example, my regular expression works fine.

I use:

preg_match_all('((https?|ftp|gopher|telnet|file|notes|ms-help):'. '((//)|(\\\\))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)', $bodyMessage, $matches, PREG_PATTERN_ORDER); 

When I do print_r( $matches); , I get:

 Array ( [0] => Array ( [0] => http://techcrunch.com/2012/07/20/kickstarter-flashr-wants-to-make-the-iphon= [1] => http://techcrunch.com/2012/07/20/last-day-to-purchase-extra-early-bird-tick= [2] => http://techcrunch.co= [3] => http://techcrunch.com/2012/07/20/kickstarter-flashr-wants-to-make-the-ip= [4] => http://techcrunch.com/2012/07/20/last-day-to-purc= [5] => http://tec= ) ... 

None of these elements in this array contain full references from the above links.

Does anyone know a good way to get what I need? I found a bunch of regex resources to get links to PHP, but none of them work.

Thanks!

Edit:

Ok, so I pull these links from email. The script parses the email, captures the body of the message, and then tries to capture the links from this. After examining the email, it seems like this somehow adds a space to the middle of the URL. Here is the output of the body message, as seen from my PHP script.

  --00248c711bb99ca36d04c54ba5c6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable http://techcrunch.com/2012/07/20/kickstarter-flashr-wants-to-make-the-iphon= es-bezel-a-massive-notification-light/?grcc=3D88888Z0ZwdgtZ0Z0Z0Z0Z0&grcc2= =3D835637c33f965e6cdd34c87219233711~1342828462249~fca4fa8af1286d8a77f26033f= deed202~510f37324b14c50a5e9121f955fac3fa~1342747216490~0~0~0~0~0~0~0~0~7~3~ http://techcrunch.com/2012/07/20/last-day-to-purchase-extra-early-bird-tick= ets-for-disrupt-sf/ --00248c711bb99ca36d04c54ba5c6 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable 

Any suggestions on how to get it to not violate URLS?

EDIT 2

As suggested by Laurnet, I ran this code:

  $bodyMessage = str_replace("= ", "",$bodyMessage); 

However, when I repeat this, it does not seem to want to replace "="

  --00248c711bb99ca36d04c54ba5c6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable http://techcrunch.com/2012/07/20/kickstarter-flashr-wants-to-make-the-iphon= es-bezel-a-massive-notification-light/?grcc=3D88888Z0ZwdgtZ0Z0Z0Z0Z0&grcc2= =3D835637c33f965e6cdd34c87219233711~1342828462249~fca4fa8af1286d8a77f26033f= deed202~510f37324b14c50a5e9121f955fac3fa~1342747216490~0~0~0~0~0~0~0~0~7~3~ http://techcrunch.com/2012/07/20/last-day-to-purchase-extra-early-bird-tick= ets-for-disrupt-sf/ --00248c711bb99ca36d04c54ba5c6 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable 
+4
source share
2 answers
  /** * * @get URLs from string (string maybe a url) * * @param string $string * @return array * */ function getUrls($string) { $regex = '/https?\:\/\/[^\" ]+/i'; preg_match_all($regex, $string, $matches); //return (array_reverse($matches[0])); return ($matches[0]); } 
+8
source

Use the following regular expression instead.

 $regex = "(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][az]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?ยซยป""'']))"; 

Hope this helps.

0
source

All Articles