Regex lookahead discards a match

Question

Regex lookahead discards a match

I am trying to make a regex that completely discards the lookahead.

\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*

This is a coincidence, and this is my regex101 test .

But when the email begins with - or _ or . , it should not fully correspond to it, and not just delete the original characters. Any ideas are welcome, I have been looking for the last half hour, but cannot figure out how to delete all the email when it starts with these characters.

+5

c # regex

dev May 14, '15 at 8:27

source share

2 answers

I use this for multiple email addresses, separate from ';':

 ([A-Za-z0-9._%-] +@ [A-Za-z0-9.-]+\.[A-Za-z]{2,4};)*

For one mail:

 [A-Za-z0-9._%-] +@ [A-Za-z0-9.-]+\.[A-Za-z]{2,4}

0

Piero alberto May 14, '15 at 9:11

source share

Wiktor stribiżew · Accepted Answer · 2015-05-14T08:39:52+0000

You can use the word border next to @ with a negative lookbehind to check if we are at the beginning of a line or immediately after a space, and then check if the first character is inside an undesirable class [^\s\-_.] :

 (?<=^|\s)[^\s\-_.]\w*(?:[-+.]\w+)*\ b@ \w+(?:[-.]\w+)*\.\w+(?:[-.]\w+)*

See demo

Match List:

 support@github.com s.miller@mit.edu j.hopking@york.ac.uk steve.parker@soft.de info@company-hotels.org kiki@hotmail.co.uk no-reply@github.com s.peterson@mail.uu.net info-bg@software-software.software.academy

Additional usage notes and alternative notations

Note that it’s best to use as few screens as possible in the regular expression, so [^\s\-_.] Can be written as [^\s_.-] , and the hyphen at the end of the character class still indicates a literal hyphen, and not a range. In addition, if you plan to use the pattern on other machines with regular expressions, you may encounter striping in lookbehind, and then you can replace (?<=\s|^) equivalent (?<!\S) . See this regex :

 (?<!\S)[^\s_.-]\w*(?:[-+.]\w+)*\ b@ \w+(?:[-.]\w+)*\.\w+(?:[-.]\w+)*

Last but not least, if you need to use it in JavaScript or other languages that do not support search queries, replace (?<!\S) / (?<=\s|^) a (non) capture group (\s|^) , wrap the entire part of the email template with a different set of capturing parentheses and use the language tools to capture the contents of group 1:

 (\s|^)([^\s_.-]\w*(?:[-+.]\w+)*\ b@ \w+(?:[-.]\w+)*\.\w+(?:[-.]\w+)*)

See the demo of regex .

Regex lookahead discards a match

More articles: