Regular expression required to match special situations

I am desperately looking for regular expressions that match these scenarios:

1) Match alternating characters

I have a string like "This is my foobababababaf string" - and I want to match "babababa"

The only thing I know is the length of the fragment to search for - I do not know what characters / numbers can be, but they alternate.

I really don't know where to start :(

2) Match the combined groups

In a string like "This is my foobaafoobaaaoooo string" - and I want to match "aaaooo". As in 1) I do not know what characters / numbers can be. I only know that they will appear in two groups.

I experimented using (.) \ 1 \ 1 \ 1 (.) \ 1 \ 1 \ 1 and such things ...

+2
source share
4 answers

I think you need something like that.

For alternating characters:

(?=(.)(?!\1)(.))(?:\1\2){2,} 

\0 will be the entire alternating sequence, \1 and \2 are two (separate) variable characters.

To start the characters N and M, possibly separated by other characters (replace N and M with numbers here):

 (?=(.))\1{N}.*?(?=(?!\1)(.))\2{M} 

\0 will be a complete match, including the infix. \1 - the character is repeated (at least) N times, \2 is the character that is repeated (at least) M times.

Here's a test harness in Java.

 import java.util.regex.*; public class Regex3 { static String runNrunM(int N, int M) { return "(?=(.))\\1{N}.*?(?=(?!\\1)(.))\\2{M}" .replace("N", String.valueOf(N)) .replace("M", String.valueOf(M)); } static void dumpMatches(String text, String pattern) { Matcher m = Pattern.compile(pattern).matcher(text); System.out.println(text + " <- " + pattern); while (m.find()) { System.out.println(" match"); for (int g = 0; g <= m.groupCount(); g++) { System.out.format(" %d: [%s]%n", g, m.group(g)); } } } public static void main(String[] args) { String[] tests = { "foobababababaf foobaafoobaaaooo", "xxyyyy axxayyyya zzzzzzzzzzzzzz" }; for (String test : tests) { dumpMatches(test, "(?=(.)(?!\\1)(.))(?:\\1\\2){2,}"); } for (String test : tests) { dumpMatches(test, runNrunM(3, 3)); } for (String test : tests) { dumpMatches(test, runNrunM(2, 4)); } } } 

This leads to the following conclusion:

 foobababababaf foobaafoobaaaooo <- (?=(.)(?!\1)(.))(?:\1\2){2,} match 0: [bababababa] 1: [b] 2: [a] xxyyyy axxayyyya zzzzzzzzzzzzzz <- (?=(.)(?!\1)(.))(?:\1\2){2,} foobababababaf foobaafoobaaaooo <- (?=(.))\1{3}.*?(?=(?!\1)(.))\2{3} match 0: [aaaooo] 1: [a] 2: [o] xxyyyy axxayyyya zzzzzzzzzzzzzz <- (?=(.))\1{3}.*?(?=(?!\1)(.))\2{3} match 0: [yyyy axxayyyya zzz] 1: [y] 2: [z] foobababababaf foobaafoobaaaooo <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{4} xxyyyy axxayyyya zzzzzzzzzzzzzz <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{4} match 0: [xxyyyy] 1: [x] 2: [y] match 0: [xxayyyy] 1: [x] 2: [y] 

Explanation

  • (?=(.)(?!\1)(.))(?:\1\2){2,} has two parts
    • (?=(.)(?!\1)(.)) Sets \1 and \2 with lookahead
      • A nested negative lookahead ensures that \1 ! = \2
      • Using lookahead to capture allows \0 have a full match (and not just the end of the tail)
    • (?:\1\2){2,} captures the sequence \1\2 , which must be repeated at least twice.
  • (?=(.))\1{N}.*?(?=(?!\1)(.))\2{M} has three parts
    • (?=(.))\1{N} captures \1 in the form and then matches it N times
      • Using lookahead to capture means the repetition may be N instead of N-1
    • .*? allows infix to separate two runs, not wanting to keep it as short as possible
    • (?=(?!\1)(.))\2{M}
      • Like the first part
      • A nested negative lookahead ensures that \1 ! = \2

A rerun trigger expression will correspond to longer runs, for example. run(2,2) matches "xxxyyy" :

 xxxyyy <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{2} match 0: [xxxyy] 1: [x] 2: [y] 

In addition, it does not allow matching matches. That is, in "xx11yyy222" there is only one run(2,3) .

 xx11yyy222 <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{3} match 0: [xx11yyy] 1: [x] 2: [y] 
+3
source

Assuming you are using perl / PCRE:

  • (.{2})\1+ or ((.)(?!\2)(.))\1+ . The second regular expression prevents elements like oooo .

UPD : then 2. will be ((.)\2{N}).*?((?!\2)(.)\4{M}) . Remove (?!\2) if you want to get matches like oooaoooo and replace N and M with n-1 and m-1.

+1
source

Well, this works for the first ...

 ((.)(.))(\2\3)+ 
0
source

Javascript examples

 a = "This is my foobababababaf string" console.log(a.replace(/(.)(.)(\1\2)+/, "<<$&>>")) a = "This is my foobaafoobaaaooo string" console.log(a.replace(/(.)\1+(.)\2+/, "<<$&>>")) 
0
source

Source: https://habr.com/ru/post/1314764/


All Articles