Negative lookbehind and greedy quantifiers in php

I use regex to find any urls and link them accordingly. However, I don't want to bind any URLs that are already connected, so I use lookbehind to see if there is an href URL in front of it. This fails because variable length quantifiers are not allowed in lookahead and lookbehind for PHP.

Here's the regex to match:

/\b(?<!href\s*=\s*[\'\"])((?:http:\/\/|www\.)\S*?)(?=\s|$)/i

What is the best way to solve this problem?

EDIT:

I still need to test it, but I think the trick for this in one regex is to use conditional expressions in the regex that PCRE supports. It will look something like this:

/(href\s*=\s*[\'\"])?(?(1)^|)((?:http:\/\/|www\.)\w[\w\d\.\/]*)(?=\s|$)/i

The key point is that when href is captured, the match is immediately thrown out due to the conditional (?(1)^|), which is guaranteed not to match. Something is probably wrong. I'll check it out tomorrow.

+5
source share
3 answers

I tried to do the same thing the other way around: make sure the URL does not end with ">:

/((?:http:\/\/|www\.)(?:[^"\s]|"[^>]|(*FAIL))*?)(?=\s|$)/i

But for me it looks pretty hacky, I'm sure you can do better.

My second approach is more like yours (and therefore more accurate):

/href\s*=\s*"[^"]*"(*SKIP)(*FAIL)|((?:http:\/\/|www\.)\S*?)(?=\s|$)/i

If I find href=me (*SKIP)(*FAIL). This means that I am moving into the position in which the regular expression engine is located when it encounters (*SKIP).

, , .

+1

" URL-, ", . URL-, URL- .

URL- , :

/<a([\s]+[\w="]+)*[\s]+href[\s]*=[\s]*"([\w\s:/.?+&=]+)"([\s]+[\w="]+)*>/i

http://regexpal.com/, . <a, , href, . href, . <a>, . , (URL-), URL- [\w\s:/.?+&=]+. URL-, - .

0

All Articles