There is already a lot of discussion on this topic, but I'm all about flogging dead horses, especially when I discover that they can still breathe.
I worked on parsing an unusual and exotic file format, which is CSV, and for fun I decided to characterize performance compared to 2.net languages โโthat I know C # and F #.
The results were ... alarming. F # won by a large margin of 2 or more times (and I actually think it is more like .5n, but getting real tests is tough as I am testing hardware IO).
The divergent performance characteristics in something as common as reading CSV surprise me (note that the coefficient means that C # wins on very small files. The more tests I do, the more it feels worse than C #surprisingly and relatively, as this probably means that I am doing it wrong).
Some notes: Core 2 Duo laptop, 80 gigs spindle drive, 3 gigabytes of ddr 800 memory, Windows 7 64-bit premium, .Net 4, without power on.
30,000 lines 5 wide 1 phrase 10 characters or less gives me a coefficient of 3 in favor of tail call recursion after the first run (it looks like it caches the file)
300,000 (the same amount of data is repeated) is a factor 2 for tail call recursion with a volatile F # implementation that wins a bit, but performance signatures indicate that I press the disk and not ram-disking the whole file, which causes the semi - random bursts of performance.
F # code
//Module used to import data from an arbitrary CSV source module CSVImport open System.IO //imports the data froma path into a list of strings and an associated value let ImportData (path:string) : List<string []> = //recursively rips through the file grabbing a line and adding it to the let rec readline (reader:StreamReader) (lines:List<string []>) : List<string []> = let line = reader.ReadLine() match line with | null -> lines | _ -> readline reader (line.Split(',')::lines) //grab a file and open it, then return the parsed data use chaosfile = new StreamReader(path) readline chaosfile [] //a recreation of the above function using a while loop let ImportDataWhile (path:string) : list<string []> = use chaosfile = new StreamReader(path) //values ina loop construct must be mutable let mutable retval = [] //loop while chaosfile.EndOfStream <> true do retval <- chaosfile.ReadLine().Split(',')::retval //return retval by just declaring it retval let CSVlines (path:string) : string seq= seq { use streamreader = new StreamReader(path) while not streamreader.EndOfStream do yield streamreader.ReadLine() } let ImportDataSeq (path:string) : string [] list = let mutable retval = [] let sequencer = CSVlines path for line in sequencer do retval <- line.Split()::retval retval
C # code
using System; using System.Collections.Generic; using System.Linq; using System.IO; using System.Text; namespace CSVparse { public class CSVprocess { public static List<string[]> ImportDataC(string path) { List<string[]> retval = new List<string[]>(); using(StreamReader readfile = new StreamReader(path)) { string line = readfile.ReadLine(); while (line != null) { retval.Add(line.Split()); line = readfile.ReadLine(); } } return retval; } public static List<string[]> ImportDataReadLines(string path) { List<string[]> retval = new List<string[]>(); IEnumerable<string> toparse = File.ReadLines(path); foreach (string split in toparse) { retval.Add(split.Split()); } return retval; } } }
Pay attention to many implementations. Using iterators, using sequences, using tail call optimizers, and loops in two languages โโ...
The main problem is that I press the disk, and therefore some idiosyncracies can be taken into account by this, I intend to rewrite this code to read from the memory stream (which should be more consistent if I do not start the swap)
But all that is taught / read to me suggests that while loops / for loops are faster than tail call optimization / recursion, and every actual benchmark that I run says dead, the opposite of this.
So, I think, my question is, should I question common wisdom?
Is tail tail recursion really better than looping in .net ecosystem?
How does it work in Mono?