Regex matches specific file format and blank lines

I am trying to use regex to match a file in the following format:

FILTER <data> ORDER <data> 

Now the <data> is the one I need to extract, and that would be very simple, except that I had the following complications:

1) This pattern can be repeated (not a single line is torn between them)

2) <data> may not be there.

In particular, this file is in order:

 FILTER test1 ORDER test2 FILTER test3 ORDER FILTER ORDER 

And should give me the following groups:

"test1", "test2", "test3", "," "," "

I have already tried the regex: (?:FILTER\n(.*)\nORDER\n(.*))*

Here is a test for regex101.

I am new to regex, any help would be appreciated.

+5
source share
2 answers

You can use regular expression with regular expression leny-dot + tempered greedy token:

 (?s)FILTER(.*?)ORDER((?:(?!FILTER).)*) ^-^ ^--------------^ 

Use the DOTALL modifier with this regex. Here is the regex demo . .*? matches any character, but as little as possible, thus matching the first ORDER . Legend token (?:(?!FILTER).)* Matches any text that is not FILTER . This is a kind of negative synonym for a symbolic character for multi-character sequences.

You can expand it as follows:

 FILTER([^O]*(?:O(?!RDER)[^O]*)*)ORDER([^F]*(?:F(?!ILTER)[^F]*)*) 

See the regex demo (and this regex does not require DOTALL mode).

 String s = "FILTER\ntest1\nORDER\ntest2\nFILTER\ntest3\nORDER\nFILTER\nORDER"; Pattern pattern = Pattern.compile("(?s)FILTER(.*?)ORDER((?:(?!FILTER).)*)"); Matcher matcher = pattern.matcher(s); List<String> results = new ArrayList<>(); while (matcher.find()){ if (matcher.group(1) != null) { results.add(matcher.group(1).trim()); } if (matcher.group(2) != null) { results.add(matcher.group(2).trim()); } } System.out.println(results); // => [test1, test2, test3, , , ] 

Watch the IDEONE demo

If you need to make sure that the FILTER and ORDER separator lines are displayed as separate lines, just use ^ and $ around them and add the MULTILINE modifier (so that ^ can match the beginning of a line, and $ can match the end of a line):

 (?sm)^FILTER$(.*?)^ORDER$((?:(?!^FILTER$).)*) ^^^^ 

See another regex .

+2
source

I would use the following regular expression:

 FILTER(?:\n(?!ORDER)(.*))?\nORDER(?:\n(?!FILTER)(.*))? 

You can check it on regex101

0
source

All Articles