CsvProvider throws an OutOfMemoryException

FAOCropsLivestock.csv contains over 14 million lines. In my .fs file .fs I declared

 type FAO = CsvProvider<"c:\FAOCropsLivestock.csv"> 

and tried to work with subsequent code

 FAO.GetSample().Rows.Where(fun x -> x.Country = country) |> .... FAO.GetSample().Filter(fun x -> x.Country = country) |> .... 

In both cases, exception was thrown.

I also tried with the following code after uploading the csv file to MSSQL Server

 type Schema = SqlDataConnection<conStr> let db = Schema.GetDataContext() db.FAOCropsLivestock.Where(fun x-> x.Country = country) |> .... 

it works. It also works if I issue a query using an OleDb , but it is slow.

How can I get a script from it using CsvProvider ?

+2
f # f # -data
source share
1 answer

If you link to the bottom of the CSV content provider documentation , you will see a section on processing large datasets. As explained here, you can set CacheRows = false , which will help you when it comes to processing large datasets.

 type FAO = CsvProvider<"c:\FAOCropsLivestock.csv", CacheRows = false> 

You can then use standard sequence operations on CSV lines in a sequence without loading the entire file into memory. eg.

 FAO.GetSample().Rows |> Seq.filter (fun x -> x.Country = country) |> .... 

However, you should list the content only once.

+6
source share

All Articles