How can I process a large file through CSVParser?

I have a large .csv file (about 300 MB) that is read from a remote host and parsed to the target file, but I do not need to copy all the lines to the target file. During copying, I need to read each line from the source, and if it passes some predicate, add the line to the target file.

I believe Apache CSV ( apache.commons.csv ) can parse the whole file

 CSVFormat csvFileFormat = CSVFormat.EXCEL.withHeader(); CSVParser csvFileParser = new CSVParser("filePath", csvFileFormat); List<CSVRecord> csvRecords = csvFileParser.getRecords(); 

therefore, I cannot use BufferedReader . Based on my code, an instance of new CSVParser() should be created for each line that looks inefficient.

How can I parse a single row (with a known table header) in the above case?

+8
java large-files csv filtering apache-commons-csv
source share
2 answers

No matter what you do, all the data from your file will go to your local computer, because your system needs to analyze it to determine the reliability. Whether the file comes through a file read through the parser (so that you can parse each line), or just copy the entire file for parsing purposes, it will all go to the local one. You will need to get the data locally and then trim the excess.

The call to csvFileParser.getRecords() has already lost the battle because the documentation explains that this method loads every line of your file into memory. To analyze a record while maintaining active memory, you should instead iterate over each record; the documentation assumes that the following code loads one record into memory at a time:

 CSVParser csvFileParser = CSVParser.parse(new File("filePath"), csvFileFormat); for (CSVRecord csvRecord : csvFileParser) { ... // qualify the csvRecord; output qualified row to new file and flush as needed. } 

Since you explained that "filePath" not local, the above solution is error prone due to connection issues. To fix connectivity issues, I recommend that you copy the entire deleted file to a local one, make sure the file is copied accurately by comparing the checksums, analyzing the local copy to create the target file, and then delete the local copy after completion.

+10
source share

This is a late answer, but you can use BufferedReader with CSVParser:

 try (BufferedReader reader = new BufferedReader(new FileReader(fileName), 1048576 * 10)) { Iterable<CSVRecord> records = CSVFormat.RFC4180.parse(reader); for (CSVRecord line: records) { // Process each line here } catch (...) { // handle exceptions from your bufferedreader here 
0
source share

All Articles