F # vs C # performance Signatures with sample code

Question

F # vs C # performance Signatures with sample code

There is already a lot of discussion on this topic, but I'm all about flogging dead horses, especially when I discover that they can still breathe.

I worked on parsing an unusual and exotic file format, which is CSV, and for fun I decided to characterize performance compared to 2.net languages that I know C # and F #.

The results were ... alarming. F # won by a large margin of 2 or more times (and I actually think it is more like .5n, but getting real tests is tough as I am testing hardware IO).

The divergent performance characteristics in something as common as reading CSV surprise me (note that the coefficient means that C # wins on very small files. The more tests I do, the more it feels worse than C #surprisingly and relatively, as this probably means that I am doing it wrong).

Some notes: Core 2 Duo laptop, 80 gigs spindle drive, 3 gigabytes of ddr 800 memory, Windows 7 64-bit premium, .Net 4, without power on.

30,000 lines 5 wide 1 phrase 10 characters or less gives me a coefficient of 3 in favor of tail call recursion after the first run (it looks like it caches the file)

300,000 (the same amount of data is repeated) is a factor 2 for tail call recursion with a volatile F # implementation that wins a bit, but performance signatures indicate that I press the disk and not ram-disking the whole file, which causes the semi - random bursts of performance.

F # code

//Module used to import data from an arbitrary CSV source module CSVImport open System.IO //imports the data froma path into a list of strings and an associated value let ImportData (path:string) : List<string []> = //recursively rips through the file grabbing a line and adding it to the let rec readline (reader:StreamReader) (lines:List<string []>) : List<string []> = let line = reader.ReadLine() match line with | null -> lines | _ -> readline reader (line.Split(',')::lines) //grab a file and open it, then return the parsed data use chaosfile = new StreamReader(path) readline chaosfile [] //a recreation of the above function using a while loop let ImportDataWhile (path:string) : list<string []> = use chaosfile = new StreamReader(path) //values ina loop construct must be mutable let mutable retval = [] //loop while chaosfile.EndOfStream <> true do retval <- chaosfile.ReadLine().Split(',')::retval //return retval by just declaring it retval let CSVlines (path:string) : string seq= seq { use streamreader = new StreamReader(path) while not streamreader.EndOfStream do yield streamreader.ReadLine() } let ImportDataSeq (path:string) : string [] list = let mutable retval = [] let sequencer = CSVlines path for line in sequencer do retval <- line.Split()::retval retval

C # code

 using System; using System.Collections.Generic; using System.Linq; using System.IO; using System.Text; namespace CSVparse { public class CSVprocess { public static List<string[]> ImportDataC(string path) { List<string[]> retval = new List<string[]>(); using(StreamReader readfile = new StreamReader(path)) { string line = readfile.ReadLine(); while (line != null) { retval.Add(line.Split()); line = readfile.ReadLine(); } } return retval; } public static List<string[]> ImportDataReadLines(string path) { List<string[]> retval = new List<string[]>(); IEnumerable<string> toparse = File.ReadLines(path); foreach (string split in toparse) { retval.Add(split.Split()); } return retval; } } }

Pay attention to many implementations. Using iterators, using sequences, using tail call optimizers, and loops in two languages ...

The main problem is that I press the disk, and therefore some idiosyncracies can be taken into account by this, I intend to rewrite this code to read from the memory stream (which should be more consistent if I do not start the swap)

But all that is taught / read to me suggests that while loops / for loops are faster than tail call optimization / recursion, and every actual benchmark that I run says dead, the opposite of this.

So, I think, my question is, should I question common wisdom?

Is tail tail recursion really better than looping in .net ecosystem?

How does it work in Mono?

+7

performance c # tail-recursion f #

Snark Feb 02 '11 at 8:58

source share

3 answers

You really, really, really , really don't have to read anything about these results - either evaluate your entire system as a form of a system test, or remove the I / O disk from the standard. It just confuses the questions. It is probably better to use the TextReader parameter rather than the physical path to avoid binding the implementation to physical files.

In addition, as a micro lens, your test has several other disadvantages:

You define numerous functions that are not called during the test. Are you testing ImportDataC or ImportDataReadLines ? Choose and choose for clarity - and in real applications, do not duplicate implementations, but take into account the similarities and determine one from the point of view of the other.
You call .Split(',') in F #, but .Split() in C # - are you going to separate the comma or spaces?
You are reinventing the wheel — at least compare your implementation to much shorter versions using higher-order functions (aka LINQ).

+5

Eamon nerbonne Feb 03 '11 at 10:56

source share

I note that it looks like your F # is using the F # list, while C # is using the .Net List. You can try changing F # to use a different list type for more data.

+2

Brian Feb 02 '11 at 9:23

source share

MartinStettner · Accepted Answer · 2011-02-02T09:18:15+0000

I think the difference may come from different List in F # and C #. F # uses singly linked lists (see http://msdn.microsoft.com/en-us/library/dd233224.aspx ), while in C # System.Collections.Generic.List ist uses System.Collections.Generic.List based.

Concatenation is much faster for singly linked lists, especially when you parse large files (you need to select / copy the entire list of arrays from time to time).

Try using LinkedList in C # code, I'm interested in the results:) ...

PS: Also, this will be a good example of when to use the profiler. You can easily find the "hot spot" of C # code ...

EDIT

So, I tried this for myself: I used two identical files to prevent caching effects. The files were 3,000,000 lines with a 10x "abcdef", separated by a comma.

The main program is as follows:

 static void Main(string[] args) { var dt = DateTime.Now; CSVprocess.ImportDataC("test.csv"); // C# implementation System.Console.WriteLine("Time {0}", DateTime.Now - dt); dt = DateTime.Now; CSVImport.ImportData("test1.csv"); // F# implementation System.Console.WriteLine("Time {0}", DateTime.Now - dt); }

(I also tried this by first executing the F # implementation and then C # ...)

Result:

C #: 3.7 seconds
F #: 7.6 seconds

Running the C # solution after the F # solution gives the same performance for the F # version, but 4.7 seconds for C # (I assume due to the large memory allocation using the F # solution). The implementation of each decision on its own does not change the above results.

Using a file with 6,000,000 lines gives ~ 7 seconds for a C # solution, a F # solution throws an OutOfMemoryException (I run this when processing with 12GB Ram ...)

So, it seems to me that the usual “wisdom” is true, and C #, using a simple loop, is faster for such tasks ...

F # vs C # performance Signatures with sample code

More articles: