LINQ Performance Improvement

I have a linq statement:

var records = from line in myfile let data = line.Split(',') select new { a=int.Parse(data[0]), b=int.Parse(data[1]) }; var average = records.Sum(r => rb)!=0?records.Sum(r => ra) / records.Sum(r => rb):0; 

My question is: How many times is the .Sum (r => rb) entry recorded on the last line? Does LINQ loop through all the records each time it needs to calculate the sum (in this case, 3 Sum () so loop 3 times)? Or does he deftly iterate over all the records only once and compare all the amounts?


Change 1 :

  • I wonder if there is a way to improve it by going through all the records only once (since we only need to do this in one loop when using a simple loop)?

  • And there is no need to load everything into memory until we can make the sum and the average . Of course, we can summarize each element by loading it from a file. Is there a way to reduce memory consumption as well?


Edit 2

To clarify a bit, I did not use LINQ before I finished, as above. Using a simple while / for loop achieves all performance requirements. But then I tried to improve readability, as well as shorten lines of code using LINQ. It seems that we cannot get both at the same time.

+7
c # linq
source share
6 answers

There are many answers, but not one of them wraps all your questions.

How many times are recorded. Sum (r => rb) is calculated on the last line?

Three times.

Does LINQ iterate over all the records each time it needs to calculate the sum (in this case 3 Sum (), so loop 3 times)?

Yes.

Or does he cleverly iterate over all the records only once and compile all the amounts?

Not.

I wonder if there is a way to improve it only after going through all the records only once (since we only need to do this in one loop when using simple for a loop)?

You can do this, but it requires that you look forward to downloading all data that contradicts your next question.

And there is no need to load everything into memory before we can do the sum and the average. Of course, we can summarize each element while loading it from the file. Is there a way to reduce memory consumption?

It is right. In the original message, you have a variable called myFile , and you iterate over it and put it in a local variable called line (read: basically a foreach ). Since you did not indicate how you received myFile data, I assume that you are looking to download all the data.

Here is a quick example of lazy loading of your data:

 public IEnumerable<string> GetData() { using (var fileStream = File.OpenRead(@"C:\Temp\MyData.txt")) { using (var streamReader = new StreamReader(fileStream)) { string line; while ((line = streamReader.ReadLine()) != null) { yield return line; } } } } public void CalculateSumAndAverage() { var sumA = 0; var sumB = 0; var average = 0; foreach (var line in GetData()) { var split = line.Split(','); var a = Convert.ToInt32(split[0]); var b = Convert.ToInt32(split[1]); sumA += a; sumB += b; } // I'm not a big fan of ternary operators, // but feel free to convert this if you so desire. if (sumB != 0) { average = sumA / sumB; } else { // This else clause is redundant, but I converted it from a ternary operator. average = 0; } } 
+5
source share

Write it like this twice, and it will be once:

 var sum = records.Sum(r => rb); var avarage = sum != 0 ? records.Sum(r => ra)/sum: 0; 
+9
source share

Three times, and you should use Aggregate here, not Sum .

 // do your original selection var records = from line in myfile let data = line.Split(',') select new { a=int.Parse(data[0]), b=int.Parse(data[1]) }; // aggregate them into one record var sumRec = records.Aggregate((runningSum, next) => { runningSum.a += next.a; runningSum.b += next.b; return runningSum; }); // Calculate your average var average = sumRec.b != 0 ? sumRec.a / sumRec.b : 0; 
+4
source share

Each call to the Sum method iterates over all the lines in myfile. To improve performance, write:

 var records = (from line in myfile let data = line.Split(',') select new { a=int.Parse(data[0]), b=int.Parse(data[1]) }).ToList(); 

to create a list with all the elements (with the properties "a" and "b"), and each call to the Sum method will go through this list without separating and analyzing the data. Of course, you can go further and recall the result of the Sum method in some temporary variable.

+2
source share

James, I'm not an expert at all, this is my idea. I think this can be reduced to 1. Perhaps there is some more code. the entries are still IEnumerable from AnonymousType {int a, int b}.

* Dynamic was a quick way to solve this problem. You must write a structure for it.

 int sum_a = 0,sum_b = 0; Func<string[], dynamic> b = (string[] data) => { sum_a += int.Parse(data[0]); sum_b += int.Parse(data[1]); return new {a = int.Parse(data[0]),b = int.Parse(data[0]) }; }; var records = from line in fileLines let data = line.Split(',') let result = b(data) select new { a = (int)result.a, b = (int)result.b }; var average = sum_b != 0 ? sum_a / sum_b : 0; 

For other structures, this is simple.

 public struct Int_Int //May be a class or interface for mapping { public int a = 0, b = 0; } 

Then

 int sum_a = 0,sum_b = 0; Func<string[], Int_Int> b = (string[] data) => { sum_a += int.Parse(data[0]); sum_b += int.Parse(data[1]); return new Int_Int() { a = int.Parse(data[0]), b = int.Parse(data[0]) }; }; var records = from line in fileLines let data = line.Split(',') select b(data); var average = sum_b != 0 ? sum_a / sum_b : 0; 
+1
source share

SUM gets all records anytime you call it, I recommend using ToList () -> Do ToList ()?

 var records = from line in myfile let data = line.Split(',') select new { a=int.Parse(data[0]), b=int.Parse(data[1]) }.ToList(); var sumb = records.Sum(r => rb); var average = sumb !=0?records.Sum(r => ra) / sumb :0; 
0
source share

All Articles