Do not use regex to separate CSV strings. This requires trouble;) Just analyze it by nature. Here is an example:
public static List<List<String>> parseCsv(InputStream input, char separator) throws IOException { BufferedReader reader = null; List<List<String>> csv = new ArrayList<List<String>>(); try { reader = new BufferedReader(new InputStreamReader(input, "UTF-8")); for (String record; (record = reader.readLine()) != null;) { boolean quoted = false; StringBuilder fieldBuilder = new StringBuilder(); List<String> fields = new ArrayList<String>(); for (int i = 0; i < record.length(); i++) { char c = record.charAt(i); fieldBuilder.append(c); if (c == '"') { quoted = !quoted; } if ((!quoted && c == separator) || i + 1 == record.length()) { fields.add(fieldBuilder.toString().replaceAll(separator + "$", "") .replaceAll("^\"|\"$", "").replace("\"\"", "\"").trim()); fieldBuilder = new StringBuilder(); } if (c == separator && i + 1 == record.length()) { fields.add(""); } } csv.add(fields); } } finally { if (reader != null) try { reader.close(); } catch (IOException logOrIgnore) {} } return csv; }
Yes, there is a small regular expression, but it only trims the end of the separator and the surrounding quotation marks of one field.
However, you can also grab the third-party Java CSV API .
Balusc
source share