FileHelpers throws an OutOfMemoryException when parsing a large csv file

I am trying to parse a very large csv file with FileHelpers ( http://www.filehelpers.net/ ). The file has a size of 1 GB and about 20 GB unpacked.

string fileName = @"c:\myfile.csv.gz"; using (var fileStream = File.OpenRead(fileName)) { using (GZipStream gzipStream = new GZipStream(fileStream, CompressionMode.Decompress, false)) { using (TextReader textReader = new StreamReader(gzipStream)) { var engine = new FileHelperEngine<CSVItem>(); CSVItem[] items = engine.ReadStream(textReader); } } } 

FileHelpers then throws an OutOfMemoryException.

Test failed: an exception of type "System.OutOfMemoryException" was thrown. System.OutOfMemoryException: type exception Fixed "System.OutOfMemoryException". in System.Text.StringBuilder.ExpandByABlock (Int32 minBlockCharCount) in System.Text.StringBuilder.Append (Char value, Int32 repeatCount) in System.Text.StringBuilder.Append (Char value) in FileHelpers.StringHelper.Extract LineQuotedring (Char , Boolean allowMultiline) in FileHelpers.DelimitedField.ExtractFieldString (LineInfo line) in FileHelpers.FieldBase.ExtractValue (LineInfo line) in FileHelpers.RecordInfo.StringToRecord (LineInfo line, DataHelperRecrtentmemnt, FileHelperRecrtentmemne 1.ReadStream(TextReader reader, Int32 maxRecords, DataTable dt) at FileHelpers.FileHelperEngine 1.ReadStream (TextReader reader)

Is it possible to parse a file with a large file file? If not, can anyone recommend a file parsing approach? Thanks.

+4
source share
2 answers

You must write a recording by recording this way:

  string fileName = @"c:\myfile.csv.gz"; using (var fileStream = File.OpenRead(fileName)) { using (GZipStream gzipStream = new GZipStream(fileStream, CompressionMode.Decompress, false)) { using (TextReader textReader = new StreamReader(gzipStream)) { var engine = new FileHelperAsyncEngine<CSVItem>(); using(engine.BeginReadStream(textReader)) { foreach(var record in engine) { // Work with each item } } } } } 

If you use this asynchronous ascript, you will only use memory for writing, and it will be much faster.

+9
source

This is not a complete answer, but if you have a 20 GB csv file, you will need 20 GB + to store all this in memory at once if your reader does not save everything compressed in memory (unlikely). You need to read the file in chunks, and the solution you use to put everything in an array will not work unless you have a huge amount of bar.

You will need a loop more similar to this:

 CsvReader reader = new CsvReader(filePath) CSVItem item = reader.ReadNextItem(); while(item != null){ DoWhatINeedWithCsvRow(item); item = reader.ReadNextItem(); } 

C # memory management will then be smart enough to get rid of old CSVItems when you go through them, provided that you don't keep links to them hanging around.

The best version would be to read a fragment from CSV (for example, 10,000 lines), process it all, then get another fragment or create a task for DoWhatINeedWithCsvRow if you don't care about the processing order.

0
source

All Articles