Java CSV line-delimited parser (multi-character)

Is there any open source Java library that supports multi-character (i.e. String with a length> 1) delimiters (delimiters) for CSV?

By definition, CSV = comma-delimited data with a single character (',') as a delimiter. However, there are many other single-character alternatives (such as a tab), which makes the CSV stand for “Separated by Symbol” (essentially DSV data: delimited data).

The main open source Java libraries for CSV (for example, OpenCSV ) support almost any character as a delimiter, but not a string (multi-character). Thus, for data separated by strings of type "|||" there is no other option than input preprocessing to convert a string to a single-character delimiter. From now on, data can be analyzed as single-character values.

Therefore, it would be nice if there was a library that supported line separators initially so that no preprocessing was required. This would mean that the CSV now had the data "CharSequence-Separated Values" .:-)

+7
source share
2 answers

This is a good question. The problem was not obvious to me until I looked at javadocs and realized that opencsv only supports a character as a delimiter, not a string ....

Here are a couple of recommended workarounds (Groovy examples can be converted to java).

Ignore Implicit Intermediate Fields

Continue to use OpenCSV, but ignore empty fields. Obviously this is a cheat, but it works great for analyzing the correct data.

CSVParser csv = new CSVParser((char)'|') String[] result = csv.parseLine('J||Project report||"F, G, I"||1') assert result[0] == "J" assert result[2] == "Project report" assert result[4] == "F, G, I" assert result[6] == "1" 

or

  CSVParser csv = new CSVParser((char)'|') String[] result = csv.parseLine('J|||Project report|||"F, G, I"|||1') assert result[0] == "J" assert result[3] == "Project report" assert result[6] == "F, G, I" assert result[9] == "1" 

Roll your own

Use the Java String tokenizer method .

  def result = 'J|||Project report|||"F, G, I"|||1'.tokenize('|||') assert result[0] == "J" assert result[1] == "Project report" assert result[2] == "\"F, G, I\"" assert result[3] == "1" 

The disadvantage of this approach is that you lose the ability to ignore quotation marks or tap delimiters.

Update

Instead of pre-processing the data, changing its contents, why not combine both of the above approaches in a two-stage process:

  • Use "roll your own" for the first data check. Separate each row and make sure that it contains the required number of fields.
  • Use the “ignore field” approach to analyze validated data, protected by knowing that the correct number of fields has been specified.

Not very efficient, but it may be easier to write your own CSV parser :-)

+4
source

Try opencsv .

It does everything you need, including (and especially) handling the built-in delimiters within the specified values ​​(for example, "a,b", "c" parses as ["a,b", "c"] )

I used it successfully and I liked it.

Edited by:

Since opencsv only handles single-character delimiters, you can work around this:

 String input; char someCharNotInInput = '|'; String delimiter = "abc"; // or whatever input.replaceAll(delimiter, someCharNotInInput); new CSVReader(input, someCharNotInInput); // etc // Put it back into each value read value.replaceAll(someCharNotInInput, delimiter); // in case it inside delimiters 
0
source

All Articles