If you are transferring a CSV file, some of your values ββmay contain double quotes, so you might need something more complex. For instance:
Pattern splitCommas = java.util.regex.Pattern.compile("(?:^|,)((?:[^\",]|\"[^\"]*\")*)"); Matcher m = splitCommas.matcher("11,=\"12,345\",ABC,,JKL"); while (m.find()) { System.out.println( m.group(1)); }
or in Groovy:
java.util.regex.Pattern.compile('(?:^|,)((?:[^",]|"[^"]*")*)') .matcher("11,=\"12,345\",ABC,,JKL") .iterator() .collect { it[1] }
This code processes:
- empty lines (no values ββor commas on them)
- empty columns, including the last column being empty
- processes values ββenclosed in double quotes, including commas inside double quotes
- but does not handle two double quotes used to escape double quotes
The template consists of:
(?:^|,) matches the beginning of a line or comma after the last column, but does not add it to the group
((?:[^",]|"[^"]*")*) matches the value of the column and consists of:
a collection group that collects zero or more characters that:
[^",] is a character that is not a comma or quote"[^"]*" is a double quote followed by zero or more other characters ending in another double quote
one or another together, using a non-collecting group: (?:[^",]|"[^"]*")
- use
* to repeat the above number of times: (?:[^",]|"[^"]*")* - and to the collection group to give the meaning of the columns:
((?:[^",]|"[^"]*")*)
Avoiding double quotes left to the reader as an exercise
Piran source share