Given:
ABC content 1 123 content 2 ABC content 3 XYZ
Is it possible to create a regular expression that matches the shortest version of "ABC [\ W \ w] +? XYZ"
Essentially, I'm looking for "ABC followed by any characters ending in XYZ but don't match if I run into ABC between them" (but think of ABC as a potential regex because it won't always be a given length .. .so ABC or ABcC can also match)
So, in a more general sense: REGEX1, followed by any character and ends with REGEX2, does not match if REGEX1 occurs between them.
In this example, I do not need the first 4 lines.
(I'm sure this explanation could potentially be needed ... further explanation haha)
EDIT:
Ok, now I see the need for further explanations! Thanks for the suggestions so far. At least I will give you more and more than I’ll think about when I start to look at how each of the solutions you propose can be applied to my problem.
Proposition 1: Discard the contents of the string and regular expression.
This is definitely a very funny hack that solves a problem based on what I explained. In simplifying the issue, I also did not mention that the same thing can happen in reverse order, because the final signature may exist later (and ended up in my specific situation). This presents the problem shown below:
ABC content 1 123 content 2 ABC content 3 XYZ content 4 MNO content 5 XYZ
In this case, I would look at something like “ABC via XYZ”, which means “catch” [ABC, content 1, XYZ] ... but accidentally catch [ABC, content 1, 123, content 2, ABC, content 3, XYZ]. A reverse that will catch the [ABC, content 3, XYZ, content 4, MNO, content 5, XYZ] instead of the [ABC, content 2, XYZ], which we want again. The point is to try to make it as general as possible, because I will also look for things that could potentially have the same start signature (in this case, the regular expression "ABC") and different end signatures.
If there is a way to create regular expressions to encapsulate this restriction, it would be much easier to simply refer to the fact that whenever I create a regular expression to search in this type of string, instead of creating a custom search algorithm that deals with it.
Proposition 2: A + B + C + [^ A] + [^ B] + [^ C] + XYZ with the IGNORECASE flag
This seems enjoyable when the ABC is finite. Think of it as a regular expression. For instance:
Hello!GoodBye!Hello.Later.
VERY simplified version of what I'm trying to do. I would like "Hello.Later". given the starting regular expression Hello [!.] and the end Later [!.]. Running something simple as Hello [!.] Later [!.] Will capture the entire line, but I want to say that if the initial regex Hello [!.] Exists between the first running instance of the regular expression and the first final regular expression instance found, ignore him.
The convoy below this sentence indicates that I can be limited by regular language restrictions similar to the parenthesis problem (Google, this is interesting to think about). The purpose of this post is to check if I really have to resort to creating a basic algorithm that handles the problem I am facing. I would really like to avoid this, if possible (using the simple example I gave you above, it’s quite easy to create a finite state machine for ... I hope this holds on because it gets a little complicated).
Proposition 3: ABC (? :( ?! ABC).) *? XYZ with DOTALL flag
I like the idea of this if it actually allows ABC to be a regular expression. I will have to investigate this when I arrive at the office tomorrow. At first glance, nothing seems too unusual, but I'm completely new to python regex (and new to regex in code, not just theoretical homework)