String pattern matching issue

Imagine we have a long string containing the substrings "cat" and "dog", as well as other random characters, for example.

cat x dog cat x cat x dog x dog x cat x dog x cat 

Here, 'x' represents any random sequence of characters (but not "cat" or "dog").

What I want to do is find each β€œcat”, followed by any characters except β€œdog” and then β€œcat”. I want to delete this first instance of "cat" in each case.

In this case, I would like to remove the bracket [cat], because after it there will be no β€œdog” before the next β€œcat”:

 cat x dog [cat] x cat x dog x dog x cat x dog x cat 

Eventually:

 cat x dog x cat x dog x dog x cat x dog x cat 

How can I do that?

I was thinking of somehow using a regex like (n) (? = (N)) as recommended by VonC here

 (cat)(?=(.*cat)) 

to match all the "cat" pairs in a string. But I still don't know how I can use this to remove every cat that is not followed by a β€œdog” in front of a β€œcat”.


The real problem I'm solving is Java. But I'm really just looking for a general pseudo-code / regular expression solution.

+7
string regex pattern-matching
source share
1 answer

Is there any specific reason you want to do this with just one RE call? I am not sure if this is possible in one RE.

If I had to do this, I would probably go through two passes. First mark each instance of "cat" and "dog" in the line, then write some code to determine which cats to remove, and do it in a different passage.

Following is the pseudo code:

 // Find all the cats and dogs int[] catLocations = string.findIndex(/cat/); int[] dogLocations = string.findIndex(/dog/); int [] idsToRemove = doLogic(catLocations, dogLocations); // Remove each identified cat, from the end to the front for (int id : idsToRemove.reverse()) string.removeSubstring(id, "cat".length()); 
+2
source share

All Articles