Required Explanation for Old Regular Expression Code

I met an old code that I wrote several years ago. However, now I have no idea why this works. I need to understand why it works before I proceed with the code changes.

I have CSV data

"abc",123456789,"def" 

Sometimes the server returns me

 "abc",123,456,789,"def" 

So, I wrote the following code to solve it.

 public class Sandbox { private static final Pattern digitPattern = Pattern.compile("(\",)|,(?=[\\d,]+,\")"); /** * @param args the command line arguments */ public static void main(String[] args) { // "abc",123,456,789,"def" String data = "\"abc\",123,456,789,\"def\""; final String result = digitPattern.matcher(data).replaceAll("$1"); // "abc",123456789,"def" System.out.println(result); } } 

However, when I look back at the code, I have no idea why |, and [\\d,]+ can help me remove the comma. What part of the input |, matches?

A step-by-step explanation of how much analysis work is understood.

+4
source share
1 answer

You replace

 (\",)|,(?=[\\d,]+,\") 

with

 $1 

In other words, you leave any commas immediately after the double quotes (that (\",) written in $1 ), but removing commas followed by 1 or more numbers and commas, a comma and a double quote (this is a lookahead statement, (?=[\\d,]+,\") ). With alternating ( | ), this is a priority from left to right, so if the first half matches, that is, the comma immediately follows the double quote, then this means that the engine moves on looking for the next comma. If the second half is matched, $1 not fixed, so you effectively replace your match (comma) with an empty string.

+2
source

All Articles