(This is a continuation of the conversation with comments on Evan's answer.)
Here's what happens when your (corrected) regular expression is applied: first,. .+ Matches the entire line. Then it returns, leaving most of the characters that it has just matched until it reaches a point at which B= can match. Then (.+?) Matches (and captures) everything that it sees, until the next part, the semicolon can match. Then the final .+ Copies the remaining characters.
All that really interests you is "B =" and ";" and what’s in between, so why match the rest of the line? The only reason you should do this is to replace the entire line with the contents of the capture group. But why do this if you can directly access the contents of the group? Here's a demo (in Java, because I can't say which language you use):
String s = "A=abc;B=def_3%^123+-;C=123;"; Pattern p = Pattern.compile("B=(.*?);"); Matcher m = p.matcher(s); if (m.find()) { System.out.println(m.group(1)); }
Why is a “replacement” when a “find” is much simpler? Probably because your API makes things easier; why do we do it in java. Java has several regex-oriented convenience methods in its String class: replaceAll() , replaceFirst() , split() and matches() (which returns true if the regex matches the entire string), but not find() . And there is no convenient method for accessing capture groups. We cannot compare with the elegance of single-line Perl elements as follows:
print $1 if 'A=abc;B=def_3%^123+-;C=123;' =~ /B=(.*?);/;
... so we are content with these hacks:
System.out.println("A=abc;B=def_3%^123+-;C=123;" .replaceFirst(".+B=(.*?);.+", "$1"));
Just to be clear, I'm not saying not to use these hacks, or that something is wrong with Evan - no. I just think we need to understand why we use them and what trade-offs we make when we do it.
source share