Regular expressions positive lookbehind + negative look

Given a string "A B C a b B", I want to combine duplicate words (regardless of case). The expected result will correspond to "a" and "b" (last occurrences of A and B) OR "A" and "B" (first occurrences)

EDIT: I want to combine only the first or last occurrence of a word

I know that this question is best answered by dividing the line and counting each token (omitting this case).
However, I would like to try to formulate a regular expression to help me find these words just for the sake of practice.

My first exit was: (?=\b(\w+)\b.*\b(\1)\b)(\1)
However, it matches the first A, first B, and second b (AB b).

I thought somehow to use a positive look with a negative look ahead to get the latest copies of a repeating word: (?<=.*(?!.*(\w+).*)\1.*)\b\1\b
(In my head translates "a word that was matched before and will not match")

Well, this does not work for me, unfortunately.

Is it possible to use a positive appearance and a negative look ahead?
Can my regex be fixed?
I tried to solve this problem in C #.

This is not homework.

+4
source share
1 answer

An interesting puzzle. Here is my solution:

(\b\w+\b)(?:(?=.*?\b\1\b)|(?<=\b\1\b.*?\1))

Demo

The rationale is as follows:

  • Match the word: (\b\w+\b)

  • Then either: (?:... |...)

    • Please try again later: (?=.*?\b\1\b)
    • Or this has happened before: (?<=\b\1\b.*?\1)

      \1 lookbehind , . \1 .


:

, :

(\b\w+\b)(?=.*?\b\1\b)(?<!\b\1\b.*?\1)

:

  • : (\b\w+\b)
  • , : (?=.*?\b\1\b)
  • , : (?<!\b\1\b.*?\1)

    ( , , lookbehind)

+1

All Articles