A regular expression is required to find a substring between two tokens

Question

A regular expression is required to find a substring between two tokens

I suspect that this has already been answered, but I cannot find it, therefore ...

I need to extract a string of two tokens into a large string, in which the second token will probably appear again, which means ... (pseudocode ...)

myString = "A=abc;B=def_3%^123+-;C=123;" ; myB = getInnerString(myString, "B=", ";" ) ; method getInnerString(inStr, startToken, endToken){ return inStr.replace( EXPRESSION, "$1"); }

so when I run this using the expression " .+B=(.+);.+ ;. .+B=(.+);.+ " I get "def_3% ^ 123 + -; C = 123;" apparently because it is just looking for a LAST instance of ';' in line, and not stop at the first one to which he comes.

I tried using (? =) In search of this first ';' but it gives me the same result.

I can not find the regExp link that explains how to specify the "NEXT" token, and not the one at the end.

any and all help is greatly appreciated.

A similar question about SO:

+4

regex regex-greedy

Genia S. Jan 28 '09 at 21:56

source share

3 answers

Try the following:

 B=([^;]+);

This corresponds to all B= and ; if it is not ; . Thus, it corresponds to all B= and the first ; after that.

+5

Gumbo Jan 28 '09 at 21:58

source share

(This is a continuation of the conversation with comments on Evan's answer.)

Here's what happens when your (corrected) regular expression is applied: first,. .+ Matches the entire line. Then it returns, leaving most of the characters that it has just matched until it reaches a point at which B= can match. Then (.+?) Matches (and captures) everything that it sees, until the next part, the semicolon can match. Then the final .+ Copies the remaining characters.

All that really interests you is "B =" and ";" and what’s in between, so why match the rest of the line? The only reason you should do this is to replace the entire line with the contents of the capture group. But why do this if you can directly access the contents of the group? Here's a demo (in Java, because I can't say which language you use):

 String s = "A=abc;B=def_3%^123+-;C=123;"; Pattern p = Pattern.compile("B=(.*?);"); Matcher m = p.matcher(s); if (m.find()) { System.out.println(m.group(1)); }

Why is a “replacement” when a “find” is much simpler? Probably because your API makes things easier; why do we do it in java. Java has several regex-oriented convenience methods in its String class: replaceAll() , replaceFirst() , split() and matches() (which returns true if the regex matches the entire string), but not find() . And there is no convenient method for accessing capture groups. We cannot compare with the elegance of single-line Perl elements as follows:

 print $1 if 'A=abc;B=def_3%^123+-;C=123;' =~ /B=(.*?);/;

... so we are content with these hacks:

 System.out.println("A=abc;B=def_3%^123+-;C=123;" .replaceFirst(".+B=(.*?);.+", "$1"));

Just to be clear, I'm not saying not to use these hacks, or that something is wrong with Evan - no. I just think we need to understand why we use them and what trade-offs we make when we do it.

+2

Alan moore Jan 30 '09 at 6:36

source share

Evan fosmark · Accepted Answer · 2009-01-28T21:58:18+0000

Do you use a greedy template without specifying in it ? . Try the following:

 ".+B=(.+?);.+"

A regular expression is required to find a substring between two tokens

More articles: