This is a good question. The problem was not obvious to me until I looked at javadocs and realized that opencsv only supports a character as a delimiter, not a string ....
Here are a couple of recommended workarounds (Groovy examples can be converted to java).
Ignore Implicit Intermediate Fields
Continue to use OpenCSV, but ignore empty fields. Obviously this is a cheat, but it works great for analyzing the correct data.
CSVParser csv = new CSVParser((char)'|') String[] result = csv.parseLine('J||Project report||"F, G, I"||1') assert result[0] == "J" assert result[2] == "Project report" assert result[4] == "F, G, I" assert result[6] == "1"
or
CSVParser csv = new CSVParser((char)'|') String[] result = csv.parseLine('J|||Project report|||"F, G, I"|||1') assert result[0] == "J" assert result[3] == "Project report" assert result[6] == "F, G, I" assert result[9] == "1"
Roll your own
Use the Java String tokenizer method .
def result = 'J|||Project report|||"F, G, I"|||1'.tokenize('|||') assert result[0] == "J" assert result[1] == "Project report" assert result[2] == "\"F, G, I\"" assert result[3] == "1"
The disadvantage of this approach is that you lose the ability to ignore quotation marks or tap delimiters.
Update
Instead of pre-processing the data, changing its contents, why not combine both of the above approaches in a two-stage process:
- Use "roll your own" for the first data check. Separate each row and make sure that it contains the required number of fields.
- Use the “ignore field” approach to analyze validated data, protected by knowing that the correct number of fields has been specified.
Not very efficient, but it may be easier to write your own CSV parser :-)
Mark o'connor
source share