LINQ query is slow

While profiling applications, I found that checking the function for pattern matching is very slow. It is written using LINQ. Simply replacing this LINQ expression with a loop makes a huge difference. What is it? Is LINQ really such a bad thing and it works so slowly, or am I misunderstanding something?

private static bool PatternMatch1(byte[] buffer, int position, string pattern) { int i = 0; foreach (char c in pattern) { if (buffer[position + i++] != c) { return false; } } return true; } 

version 2 with LINQ (proposed by Resharper)

 private static bool PatternMatch2(byte[] buffer, int position, string pattern) { int i = 0; return pattern.All(c => buffer[position + i++] == c); } 

version 3 with LINQ

 private static bool PatternMatch3(byte[] buffer, int position, string pattern) { return !pattern.Where((t, i) => buffer[position + i] != t).Any(); } 

version 4 using lambda

 private static bool PatternMatch4(byte[] buffer, int position, string pattern, Func<char, byte, bool> predicate) { int i = 0; foreach (char c in pattern) { if (predicate(c, buffer[position + i++])) { return false; } } return true; } 

and used here with a large buffer

 const int SIZE = 1024 * 1024 * 50; byte[] buffer = new byte[SIZE]; for (int i = 0; i < SIZE - 3; ++i) { if (PatternMatch1(buffer, i, "xxx")) { Console.WriteLine(i); } } 

Calling PatternMatch2 or PatternMatch3 phenomenally slow. This takes about 8 seconds for PatternMatch3 and about 4 seconds for PatternMatch2 , and when you call PatternMatch1 , about 0.6. As far as I understand, this is the same code, and I see no difference. Any ideas?

+8
performance c # linq
source share
4 answers

Well take the Where statement.

The implementation is almost (*):

 public IEnumerable<T> Where(this IEnumerable<T> input, Func<T, bool> fn) { foreach(T i in input) if (fn(i)) yield return i; } 

This means that an iterator object is created for each loop over IEnumerable - note that you have SIZE-n from these distributions because you are executing these many LINQ queries.

Then for each character in the pattern you:

  • Function call for enumerator
  • Call a predicate function

The second call is for the delegate, which costs about twice the cost of calling a typical virtual call (where in the first version you do not have additional calls sharing the array de-indexing.

If you really need rough performance, you probably want to get the "old style" code as much as possible. In this case, I would even replace this foreach in method 1 with a simple one (at least if it does not serve for optimization, this makes it more readable, since you are tracking the index anyway).

It is also more readable in this case, and it shows that Resharper's proposals are sometimes controversial.

(*) I said almost because it uses a proxy method to verify that the input counter is not null and throws an exception earlier than the collection count time - this is a small detail that does not invalidate what I wrote earlier, emphasizing here is the fullness.

+6
source share

Mark Byers and Marco Mp are fair for additional overhead calls. However, there is another reason: many object selections due to closure.

The compiler creates a new object at each iteration, preserving the current values โ€‹โ€‹of buffer , position and i that the predicate can use. Here is a snapshot of dotTrace PatternMatch2 showing this. <>c_DisplayClass1 is a class generated by the compiler.

(Note that the numbers are huge because I use trace profiling, which adds overhead. However, these are the same overhead for each call, so the overall percentages are correct).

enter image description here

+7
source share

The difference is that there are additional function calls in the LINQ version. Internally in LINQ, your lambda function is called in a loop.

While the extra call can be optimized using the JIT compiler , it is not guaranteed and can add a little overhead. In most cases, the extra overhead will not matter, but since your function is very simple and called an extremely large number of times, even small overheads can add up quickly. In my experience, LINQ is usually a bit slower than equivalent for loops. This is the execution price that you often pay for higher-level syntax.

In this situation, I would recommend adhering to an explicit loop.

While you are optimizing this section of code, you can also consider a more efficient search algorithm. Your worst case algorithm is O (n * m), but there are better algorithms .

+6
source share

The main purpose of LINQ when applied to collections is simplicity. If you need performance, you should completely avoid LINQ. Also, to enumerate an array, you simply increase the index variable, while LINQ needs to configure the entire enumerator object.

+1
source share

All Articles