I am trying to reproduce the problem that you see, because you cannot share the data that I tried to create some test data. However, on my machine (.NET 4.6.2 F # 4.1), I do not see it taking minutes, it takes several seconds.
Perhaps you can try to see how my sample application works in your setup, and can we work with it?
open System open System.Diagnostics open System.IO let clock = let sw = Stopwatch () sw.Start () fun () -> sw.ElapsedMilliseconds let time a = let before = clock () let v = a () let after = clock () after - before, v let generateDataSet () = let random = Random 19740531 let firstDate = DateTime(1970, 1, 1) let randomInt () = random.Next () |> int64 |> (+) 10000000000L |> string let randomDate () = (firstDate + (random.Next () |> float |> TimeSpan.FromSeconds)).ToString("s") let randomString () = let inline valid ch = match ch with | '"' | '\\' -> ' ' | _ -> ch let c = random.Next () % 16 let gi = if i = 0 || i = c + 1 then '"' else 32 + random.Next() % (127 - 32) |> char |> valid Array.init (c + 2) g |> String let columns = [| "Id" , randomInt "ForeignId" , randomInt "BirthDate" , randomDate "OtherDate" , randomDate "FirstName" , randomString "LastName" , randomString |] use sw = new StreamWriter ("perf.csv") let headers = columns |> Array.map fst |> String.concat ";" sw.WriteLine headers for i = 0 to 700000 do let values = columns |> Array.map (fun (_, f) -> f ()) |> String.concat ";" sw.WriteLine values open FSharp.Data [<Literal>] let sample = """Id;ForeignId;BirthDate;OtherDate;FirstName;LastName 11795679844;10287417237;2028-09-14T20:33:17;1993-07-21T17:03:25;", xS@ %aY)N*})Z";"ZP~;" 11127366946;11466785219;2028-02-22T08:39:57;2026-01-24T05:07:53;"H-/QA(";"g8}J?k~" """ type PerfFile = CsvProvider<sample, ";"> let readDataWithTp () = use streamReader = new StreamReader ("perf.csv") let csvFile = PerfFile.Load streamReader let length = csvFile.Rows |> Seq.length printfn "%A" length [<EntryPoint>] let main argv = Environment.CurrentDirectory <- AppDomain.CurrentDomain.BaseDirectory printfn "Generating dataset..." let ms, _ = time generateDataSet printfn " took %d ms" ms printfn "Reading dataset..." let ms, _ = time readDataWithTp printfn " took %d ms" ms 0
Performance numbers (.NET462 on my desktop):
Generating dataset... took 2162 ms Reading dataset... took 6156 ms
Performance numbers (Mono 4.6.2 on my Macbook Pro):
Generating dataset... took 4432 ms Reading dataset... took 8304 ms
Update
It turns out that pointing Culture to a CsvProvider clearly degrades performance. It can be any culture, not just sv-SE , but why?
If someone checks the code that the provider generates for quick and slow cases, note the difference:
Quick
internal sealed class csvFile@78 { internal System.Tuple<long, long, System.DateTime, System.DateTime, string, string> Invoke(object arg1, string[] arg2) { Microsoft.FSharp.Core.FSharpOption<string> fSharpOption = TextConversions.AsString(arg2[0]); long arg_C9_0 = TextRuntime.GetNonOptionalValue<long>("Id", TextRuntime.ConvertInteger64("", fSharpOption), fSharpOption); fSharpOption = TextConversions.AsString(arg2[1]); long arg_C9_1 = TextRuntime.GetNonOptionalValue<long>("ForeignId", TextRuntime.ConvertInteger64("", fSharpOption), fSharpOption); fSharpOption = TextConversions.AsString(arg2[2]); System.DateTime arg_C9_2 = TextRuntime.GetNonOptionalValue<System.DateTime>("BirthDate", TextRuntime.ConvertDateTime("", fSharpOption), fSharpOption); fSharpOption = TextConversions.AsString(arg2[3]); System.DateTime arg_C9_3 = TextRuntime.GetNonOptionalValue<System.DateTime>("OtherDate", TextRuntime.ConvertDateTime("", fSharpOption), fSharpOption); fSharpOption = TextConversions.AsString(arg2[4]); string arg_C9_4 = TextRuntime.GetNonOptionalValue<string>("FirstName", TextRuntime.ConvertString(fSharpOption), fSharpOption); fSharpOption = TextConversions.AsString(arg2[5]); return new System.Tuple<long, long, System.DateTime, System.DateTime, string, string>(arg_C9_0, arg_C9_1, arg_C9_2, arg_C9_3, arg_C9_4, TextRuntime.GetNonOptionalValue<string>("LastName", TextRuntime.ConvertString(fSharpOption), fSharpOption)); } }
Slow
internal sealed class csvFile@78 { internal System.Tuple<long, long, System.DateTime, System.DateTime, string, string> Invoke(object arg1, string[] arg2) { Microsoft.FSharp.Core.FSharpOption<string> fSharpOption = TextConversions.AsString(arg2[0]); long arg_C9_0 = TextRuntime.GetNonOptionalValue<long>("Id", TextRuntime.ConvertInteger64("sv-SE", fSharpOption), fSharpOption); fSharpOption = TextConversions.AsString(arg2[1]); long arg_C9_1 = TextRuntime.GetNonOptionalValue<long>("ForeignId", TextRuntime.ConvertInteger64("sv-SE", fSharpOption), fSharpOption); fSharpOption = TextConversions.AsString(arg2[2]); System.DateTime arg_C9_2 = TextRuntime.GetNonOptionalValue<System.DateTime>("BirthDate", TextRuntime.ConvertDateTime("sv-SE", fSharpOption), fSharpOption); fSharpOption = TextConversions.AsString(arg2[3]); System.DateTime arg_C9_3 = TextRuntime.GetNonOptionalValue<System.DateTime>("OtherDate", TextRuntime.ConvertDateTime("sv-SE", fSharpOption), fSharpOption); fSharpOption = TextConversions.AsString(arg2[4]); string arg_C9_4 = TextRuntime.GetNonOptionalValue<string>("FirstName", TextRuntime.ConvertString(fSharpOption), fSharpOption); fSharpOption = TextConversions.AsString(arg2[5]); return new System.Tuple<long, long, System.DateTime, System.DateTime, string, string>(arg_C9_0, arg_C9_1, arg_C9_2, arg_C9_3, arg_C9_4, TextRuntime.GetNonOptionalValue<string>("LastName", TextRuntime.ConvertString(fSharpOption), fSharpOption)); } }
More specifically, this difference:
// Fast TextRuntime.ConvertDateTime("", fSharpOption), fSharpOption) // Slow TextRuntime.ConvertDateTime("sv-SE", fSharpOption), fSharpOption)
When we specify the culture, it is passed to ConvertDateTime , which translates it to GetCulture
static member GetCulture(cultureStr) = if String.IsNullOrWhiteSpace cultureStr then CultureInfo.InvariantCulture else CultureInfo cultureStr
This means that for the default case, we use CultureInfo.InvariantCulture , but for any other case, for each field and row, we create a CultureInfo object. Caching can be performed, but it is not. The creation process itself does not seem to take too much time, but something happens when we each time deal with a new CultureInfo object.
Parsing DateTime in FSharp.Data essentially this
let dateTimeStyles = DateTimeStyles.AllowWhiteSpaces ||| DateTimeStyles.RoundtripKind match DateTime.TryParse(text, cultureInfo, dateTimeStyles) with
So, let's run a performance test in which we will use the cached CultureInfo object, and the other every time we create it every time.
open System open System.Diagnostics open System.Globalization let clock = let sw = Stopwatch () sw.Start () fun () -> sw.ElapsedMilliseconds let time a = let before = clock () let v = a () let after = clock () after - before, v let perfTest c cf () = let dateTimeStyles = DateTimeStyles.AllowWhiteSpaces ||| DateTimeStyles.RoundtripKind let text = DateTime.Now.ToString ("", cf ()) for i = 1 to c do let culture = cf () DateTime.TryParse(text, culture, dateTimeStyles) |> ignore [<EntryPoint>] let main argv = Environment.CurrentDirectory <- AppDomain.CurrentDomain.BaseDirectory let ct = "sv-SE" let cct = CultureInfo ct let count = 10000 printfn "Using cached CultureInfo object..." let ms, _ = time (perfTest count (fun () -> cct)) printfn " took %d ms" ms printfn "Using fresh CultureInfo object..." let ms, _ = time (perfTest count (fun () -> CultureInfo ct)) printfn " took %d ms" ms 0
Performance numbers in .NET 4.6.2 F # 4.1:
Using cached CultureInfo object... took 16 ms Using fresh CultureInfo object... took 5328 ms
Thus, caching a CultureInfo object in FSharp.Data should significantly improve the performance of CsvProvider when specifying a culture.