Match something, something?

Question

Match something, something?

With regex, how can everything be combined in a string that is not something? This may not make sense, but read on.

So, take the word baby , for example, to match everything that is not b , you would do something like [^b] , and that would match a and y . Simple enough! But how on this line of Ben sits on a bench can I match everything that is not ben , so I would try to match sits on a ch ?

Better yet to match everything that is not a template? for example, in 1a2be3 match everything that is not number,letter,number , so will it match every combination in the line except 1a2 ?

+8

regex

Srb1313711 Dec 10 '13 at 10:52

source share

6 answers

rednaw · Answer 1 · 2013-12-17T09:58:29+0000

 (?:ben)|(.)

What this regex does does matches ben or any other character, however ben not written, but the rest of the characters. This way you will get many matches except ben . Then you can combine all these matches to get a string without ben .

Here is an example in python.

 import re thestr = "Ben sits on a bench" regex = r'(?:ben)|(.)' matches = re.findall(regex, thestr, re.IGNORECASE) print ''.join(matches)

This will:

  sits on a ch

Pay attention to the leading space. You can, of course, get rid of this by adding .strip() .

Also note that it is probably faster to execute a regex that replaces ben an empty string to get the same result. But if you want to use this technique in a more complex regex, this might come in handy.

And of course, you can also add more complex regular expressions to the ben place, so the example number,letter,number will look like this:

 (?:[0-9][az][0-9])|(.)

Adam katz · Answer 2 · 2014-01-28T09:09:16+0000

Short answer: you cannot do what you ask. Technically, the first part has an ugly answer, but the second part (as I understand it) has no answer.

For your first part, I have a rather impractical (but pure regex) response; anything better would require code (e.g. @rednaw a much cleaner answer above). I added to the test to make it more complete. (For simplicity, I use grep -Pio for PCRE, case insensitive, prints one match per line.)

 $ echo "Ben sits on a bench better end" \ |grep -Pio '(?=b(?!en)|(?<!b)en|e(?!n)|(?<!be)n|[^ben])\w+' sits on a ch better end

I basically do a special case for any letter in "ben", so I can only include iterations that themselves are not part of the string "ben". As I said, it’s not very practical, even if I technically answer your question. I also kept a step-by-step explanation of this regex if you would like more information.

If you are forced to use a purely regular expression rather than code, the best choice for such elements is to write code to generate a regular expression. This way you can keep a blank copy.

I'm not sure what you are asking for the remainder of your task; the regular expression is either greedy or lazy [1] [2] , and I don't know any implementations that can find "every combination", and not just the first combination by any method. If this were so, in real life it would be very slow (rather than quick examples); the slow speed of regex engines would be unbearable if they were forced to explore every opportunity, which would basically be ReDoS .

Examples:

 # greedy evaluation (default) $ echo 1a2be3 |grep -Pio '(?!\d[az]\d)\w+' a2be3 # lazy evaluation $ echo 1a2be3 |grep -Pio '(?!\d[az]\d)\w+?' a 2 b e 3

I assume that you are looking for 1 1a a a2 a2b a2be a2be3 2 2b 2be 2be3 b be be3 e e3 3 , but I do not think you can get this with a pure regex. You will need code to generate each substring, and then you can use the regular expression to filter the forbidden pattern (again, it's all about greedy vs lazy vs ReDoS).

hillel · Answer 3 · 2013-12-10T11:08:27+0000

If you want to combine all but one word, you can use a negative lookahead: \b(?!ben\b)\w*\b , but Jon comment seems the simplest to answer your exact question.

Ronin · Answer 4 · 2014-01-07T22:20:57+0000

Good. The simplest thing to do is combine everything.

 (.*?)

Then, in the mapped template, make another match for what you don't want (for example, in perl you will have a template mapped to the $ & variable).

If this matches, this is not what you need; you have a match.

Simple AB, where A is everything (. *?) And B is what you don't want. So you make two matches, but I think it’s good.

Bohemian · Answer 5 · 2014-01-07T23:20:53+0000

Just replace everything that matches your template with a space (to remove it).

You did not indicate which language you use, so genetically:

 s/ben//g

and your other example:

 s/\d[a-zA-Z]\d//g

Yuriy kovalev · Answer 6 · 2014-01-18T11:51:47+0000

If you need a list of strings, use "split on regexp" instead of "match on regexp".

Match something, something?

More articles: