String pattern matching issue

Question

String pattern matching issue

Imagine we have a long string containing the substrings "cat" and "dog", as well as other random characters, for example.

cat x dog cat x cat x dog x dog x cat x dog x cat

Here, 'x' represents any random sequence of characters (but not "cat" or "dog").

What I want to do is find each “cat”, followed by any characters except “dog” and then “cat”. I want to delete this first instance of "cat" in each case.

In this case, I would like to remove the bracket [cat], because after it there will be no “dog” before the next “cat”:

 cat x dog [cat] x cat x dog x dog x cat x dog x cat

Eventually:

 cat x dog x cat x dog x dog x cat x dog x cat

How can I do that?

I was thinking of somehow using a regex like (n) (? = (N)) as recommended by VonC here

 (cat)(?=(.*cat))

to match all the "cat" pairs in a string. But I still don't know how I can use this to remove every cat that is not followed by a “dog” in front of a “cat”.

The real problem I'm solving is Java. But I'm really just looking for a general pseudo-code / regular expression solution.

+7

string regex pattern-matching

nodmonkey Oct 28 '10 at 16:34

source share

1 answer

zigdon · Accepted Answer · 2010-10-28T16:57:16+0000

Is there any specific reason you want to do this with just one RE call? I am not sure if this is possible in one RE.

If I had to do this, I would probably go through two passes. First mark each instance of "cat" and "dog" in the line, then write some code to determine which cats to remove, and do it in a different passage.

Following is the pseudo code:

 // Find all the cats and dogs int[] catLocations = string.findIndex(/cat/); int[] dogLocations = string.findIndex(/dog/); int [] idsToRemove = doLogic(catLocations, dogLocations); // Remove each identified cat, from the end to the front for (int id : idsToRemove.reverse()) string.removeSubstring(id, "cat".length());

String pattern matching issue

More articles: