How is IEnumerable different from IObservable under the hood?

I'm curious how IEnumerable differs from IObservable under the hood. I understand pull and push patterns respectively, but like C # in terms of memory, etc. Notifies subscribers (for IObservable) that it should receive the next bit of data in memory to process? How the observed instance knows that it has changed data to click on subscribers.

My question comes from a test that I ran in lines from a file. The file was about 6 MB.

Standard time: 4.7s, lines: 36587

Rx Time: 0.68 s, rows: 36587

How can Rx significantly improve the normal iteration of each line in a file?

 private static void ReadStandardFile() { var timer = Stopwatch.StartNew(); var linesProcessed = 0; foreach (var l in ReadLines(new FileStream(_filePath, FileMode.Open))) { var s = l.Split(','); linesProcessed++; } timer.Stop(); _log.DebugFormat("Standard Time Taken: {0}s, lines: {1}", timer.Elapsed.ToString(), linesProcessed); } private static void ReadRxFile() { var timer = Stopwatch.StartNew(); var linesProcessed = 0; var query = ReadLines(new FileStream(_filePath, FileMode.Open)).ToObservable(); using (query.Subscribe((line) => { var s = line.Split(','); linesProcessed++; })); timer.Stop(); _log.DebugFormat("Rx Time Taken: {0}s, lines: {1}", timer.Elapsed.ToString(), linesProcessed); } private static IEnumerable<string> ReadLines(Stream stream) { using (StreamReader reader = new StreamReader(stream)) { while (!reader.EndOfStream) yield return reader.ReadLine(); } } 
+7
source share
2 answers

My guess is the behavior you see is a reflection of the caching of the OS file. I would suggest that if you change the order of the calls, you will see a similar difference in speeds, just switch places.

You can improve this test by doing a few warm-ups or by copying the input file to the temp file using File.Copy before testing each one. This way the file will not be “hot” and you will get a fair comparison.

+5
source

I suspect you are seeing some kind of internal CLR optimization. It probably caches the contents of the file in memory between two calls so that ToObservable can pull out the content faster ...

Edit: Oh, a good colleague with a crazy nickname eeh ... @sixlettervariables was faster, and he was probably right: OS rather optimizing than CLR.

+1
source

All Articles