Regex help includes and excludes

I need help with regex.

I am trying to create an expression that will contain specific lines and exclude certain lines.

For example:

I would like to include any URL containing mobility http://www.something.com/mobility/

However, I would like to exclude any url containing the http://www.something.com/store/mobility/ repository

FYI I have many keywords that I use to include. I currently include like this /mobility|enterprise|products/i , but I do not find it able to exclude links containing other keywords.

Thank you for any help and understanding you can provide.

_t

+8
regex regex-negation
source share
3 answers

You can do all this in one regex, but you really don't need it. I think it would be better if you run two separate tests: one for your included rules and one for your exception rules. Not sure which language you are using, so I will use JavaScript for an example:

 function validate(str) { var required = /\b(mobility|enterprise|products)\b/i; var blocked = /\b(store|foo|bar)\b/i; return required.test(str) && !blocked.test(str); } 

If you really want to do this in one template, try something like this:

 /(?=.*\b(mobility|enterprise|products)\b)(?!.*\b(store|foo|bar)\b)(.+)/i 

i at the end means case insensitive, so use your language if you are not using JavaScript.

All that has been said, based on your description of the problem, I think what you really want for this is string manipulation. Here is an example using JS again:

 function validate(str) { var required = ['mobility','enterprise','products']; var blocked = ['store','foo','bar']; var lowercaseStr = str.toLowerCase(); //or just use str if you want case sensitivity for (var i = 0; i < required.length; i++) { if (lowercaseStr.indexOf(required[i]) === -1) { return false; } } for (var j = 0; j < blocked.length; j++) { if (lowercaseStr.indexOf(blocked[j]) !== -1) { return false; } } } 
+4
source share

To match a string that should have a word from a set of words, you can use a positive lookahead like:

 ^(?=.*(?:inc1|inc2|...)) 

In order to not match the line containing the word from the stop word list, you can use a negative lookahead like:

 ^(?!.*(?:ex1|ex2|...)) 

You can combine the two above requirements in one regular expression as:

 ^(?=.*(?:inc1|inc2|...))(?!.*(?:ex1|ex2|...))REGEX_TO_MATCH_URL$ 

Ruble link

+11
source share

Make two regular expressions, one for good and one for bad, and check both? (first bad, then good). You can do this with a single regex, but KISS is always a good rule ( http://en.wikipedia.org/wiki/KISS_principle )

I will add that you need to consider the "ass" principle .... .*ass matches ambassador and cassette , so you probably want to have a delimiter ( [./\\] ) before and after each word. Obscene filters: a bad idea or the incredible interdependence of a bad idea?

+2
source share

All Articles