Regular expression with or uses second regular expression on first match?

I had a situation where I needed to remove parts of a string, and I decided that I could use regex for this.

Test case similar to

LINDA L LINDSAY GRIFFIN LINDSAY LIGHTHOUSE LINDSAY PETERSON 

and I want to remove the ending L\b from the first or leading L.*?\b from the second and third, which should leave me with this:

 LINDA GRIFFIN LIGHTHOUSE PETERSON 

L\b|L.*?\b deletes the entire first and third lines (except for the space), which is not what I want. Is there a way to do this with a single expression? I suggested that since the first regular expression matches, it will not move to the second.

Thanks to everyone, we ended up using only CF conventions and two replacements instead of one complex regex.

+4
source share
4 answers

I think this does what you want to do:

 (\bL$)|((?!.*\bL$)^L.*?\b) 

To explain, (\bL$) matches the first pattern: the word boundary, then L, and then the end of the line.

((?!.*\bL$)^L.*?\b) matches L at the beginning of the line, followed by the rest of the word ( .*?\b , like you, is a reasonable pattern to get to the end the words). This: (?!.*\bL$) is a negative result, preventing a match if the pattern following ?! . In this case, it will prevent a match if patterm \bL$ appears anywhere on the line.

This is what I came up with. Of course, ugly. A much better way to do this, as you hinted at the question, would be to use two separate regex patterns, launching the second only when the first has not found a match for the string.

+3
source

@femtoRgon almost got it, but leaves some kind of empty overhang. Complete CF Solution:

 result = reReplace(string, "(\s*\bL$)|((?!.*\bL$)^L.*?\b\s*)", "", "ONE"); 

Where string will be "LINDA L" or "LINDSAY GRIFFIN" etc.

This checks all the examples you provided, but it is quite literally about the rules that you specified.

+1
source

Note. . It is assumed that you have one line and you want both actions to be applied if necessary (i.e. the second does not depend on the first); if this is not what you want, you need to clarify the issue.


Doing this with one regex makes things unnecessarily ugly (and therefore less maintainable) - here is a way to do this with two:

 Input.replaceFirst('\s+L(?=\n)','').replaceAll('(?<=\n)L\w+\s+','') 

The first expression removes L (and previous spaces) from the first line (and since we use replaceFirst, only the first line).

The second expression deletes all L-words at the beginning of the line (except for the first line, which does not have a new line before it).

(Since in both cases we will always have a \s+ match, there is no need for an explicit \b , you can use it instead if you do not want deleted spaces to be removed.)


If you prefer to do this using the CFML return function, the equivalent will look like this:

 rereplace( rereplace(Input,'\s+L(?=\n)','') , '(\n)L\w+\s+' , '\1' , 'all' ) 

Personally, I find the other way more readable.

+1
source

You should check for conditional regex.

http://www.regular-expressions.info/conditional.html

0
source

All Articles