Create a Guava Splitter Constructor

I would like to create a Guava Separator for Java that can process Java strings as a single block. For example, I would like the following statement to be true:

@Test public void testSplitter() { String toSplit = "a,b,\"c,d\\\"\",e"; List<String> expected = ImmutableList.of("a", "b", "c,d\"","e"); Splitter splitter = Splitter.onPattern(...); List<String> actual = ImmutableList.copyOf(splitter.split(toSplit)); assertEquals(expected, actual); } 

I can write a regular expression to find all the elements and ignore the ",", but I can not find a regular expression that will act as a delimiter that will be used with Splitter.

If this is not possible, just say so, then I will create a list from the findAll regular expression.

+4
source share
5 answers

It seems that you should use a CSV library such as opencsv for. Separating values โ€‹โ€‹and processing cases, such as quoted blocks, is what they are.

+4
source

I have the same problem (except for the need to support escaping the quote character). I do not like to include another library for such a simple thing. And then I came to the conclusion that I needed a mutable CharMatcher. As with the Bart Kiers decision, it retains the character of the quotes.

 public static Splitter quotableComma() { return on(new CharMatcher() { private boolean inQuotes = false; @Override public boolean matches(char c) { if ('"' == c) { inQuotes = !inQuotes; } if (inQuotes) { return false; } return (',' == c); } }); } @Test public void testQuotableComma() throws Exception { String toSplit = "a,b,\"c,d\",e"; List<String> expected = ImmutableList.of("a", "b", "\"c,d\"", "e"); Splitter splitter = Splitters.quotableComma(); List<String> actual = ImmutableList.copyOf(splitter.split(toSplit)); assertEquals(expected, actual); } 
+4
source

You can break it down into the following pattern:

 \s*,\s*(?=((\\["\\]|[^"\\])*"(\\["\\]|[^"\\])*")*(\\["\\]|[^"\\])*$) 

which might look (a little) friendlier with the flag (?x) :

 (?x) # enable comments, ignore space-literals \s*,\s* # match a comma optionally surrounded by space-chars (?= # start positive look ahead ( # start group 1 ( # start group 2 \\["\\] # match an escaped quote or backslash | # OR [^"\\] # match any char other than a quote or backslash )* # end group 2, and repeat it zero or more times " # match a quote ( # start group 3 \\["\\] # match an escaped quote or backslash | # OR [^"\\] # match any char other than a quote or backslash )* # end group 3, and repeat it zero or more times " # match a quote )* # end group 1, and repeat it zero or more times ( # open group 4 \\["\\] # match an escaped quote or backslash | # OR [^"\\] # match any char other than a quote or backslash )* # end group 4, and repeat it zero or more times $ # match the end-of-input ) # end positive look ahead 

But even in this commented version, it's still a monster. In plain English, this regular expression can be explained as follows:

Match a comma that is optionally surrounded by space characters only if you look ahead of that comma (all the way to the end of the line!), There are zero or even number of quotes, ignoring escaped quotes or escaped backslashes.

So, after seeing this, you can agree with ColinD (I do!) That using some kind of CSV analyzer is the way to go in this case.

Note that the above expression will leave qoutes around the tokens, i.e. the string a,b,"c,d\"",e (as a literal: "a,b,\"c,d\\\"\",e" ) will be broken as follows:

 a b "c,d\"" e 
+2
source

The improvement on @ Rage-Steel is slightly responsive.

 final static CharMatcher notQuoted = new CharMatcher() { private boolean inQuotes = false; @Override public boolean matches(char c) { if ('"' == c) { inQuotes = !inQuotes; } return !inQuotes; }; final static Splitter SPLITTER = Splitter.on(notQuoted.and(CharMatcher.anyOf(" ,;|"))).trimResults().omitEmptyStrings(); 

And then,

 public static void main(String[] args) { final String toSplit = "a=bc=d,kuku=\"e=f|g=h something=other\""; List<String> sputnik = SPLITTER.splitToList(toSplit); for (String s : sputnik) System.out.println(s); } 

Pay attention to thread safety (or, for simplicity, no)

0
source

All Articles