How to eliminate time spent in JIT_TailCall for functions that are truly non-recursive

I am writing a 64-bit F # solution, and profiling revealed a surprisingly and unexpectedly large amount of time spent in JIT_TailCall ... in fact, it dominates the execution time (about 80%). This appears along with his evil cousin JIT_TailCallHelperStub_ReturnAddress .

I definitely traced the source by passing a struct type (custom value types) in a method or property call across the assembly border. I am sure of this because if I bypass the method call and assign my struct property directly (the one that used the violation method), the performance magically improves by 4-5 times less execution time!

The calling assembly uses F # 3.1 because it dynamically compiles with the latest stable version of FSharp.Compiler.Services.

The called assembly uses F # 4.0 / .NET 4.6 (VS 2015).

UPDATE

The simplification of what I'm trying to do is assign a custom struct value to an array position from a dynamically generated assembly ...

Runtime is fast and no tail tail calls are generated during a call:

  • Property displaying a private array in type

However, the execution time is slow due to extraneous tail calls generated during the call:

  • Indexer Property Displaying the Array (Item)

  • Member method acting as a setter for an array

The reason I need to call a member method is because I need to do a few checks before inserting an element into an array.

PRACTICAL

Besides understanding the root of the problem, I would like to know if F # 4.0 and, therefore, the upcoming version of FSharp.Compiler.Services will solve this problem. Given that the updated FSharp.Compiler.Services are relatively inevitable, then it might just be better to wait.

+6
source share
1 answer

I posted this to a GitHub question , but cross-posted it here to make it easier to find:

I have a case where mutually recursive functions generate 30% load for JIT_TailCall and 15% load for JIT_TailCallHelperStub_ReturnAddress. These functions are closed over method variables and class fields. When I refuse to generate a tail call, my productivity increases by exactly 45%.

I did not profile this fragment, but my real code is structured exactly like this:

 #time "on" type MyRecType() = let list = System.Collections.Generic.List() member this.DoWork() = let mutable tcs = (System.Runtime.CompilerServices.AsyncTaskMethodBuilder<int>.Create()) let returnTask = tcs.Task // NB! must access this property first let mutable local = 1 let rec outerLoop() = if local < 1000000 then innerLoop(1) else tcs.SetResult(local) () and innerLoop(inc:int) = if local % 2 = 0 then local <- local + inc outerLoop() else list.Add(local) // just fake access to a field to illustrate the pattern local <- local + 1 innerLoop(inc) outerLoop() returnTask let instance = MyRecType() instance.DoWork().Result > Real: 00:00:00.019, CPU: 00:00:00.031, GC gen0: 0, gen1: 0, gen2: 0 > val it : int = 1000001 

.NET 4.6 and F # 4.0 do not help at all.

I tried to rewrite this as methods, but got a StackOverflowException. However, I do not understand why I am not getting SO when I run a very large number of iterations without generating a tail call?

Update Rewriting a method as:

  member this.DoWork2() = let mutable tcs = (System.Runtime.CompilerServices.AsyncTaskMethodBuilder<int>.Create()) let returnTask = tcs.Task // NB! must access this property first let mutable local = 1 let rec loop(isOuter:bool, inc:int) = if isOuter then if local < 1000000 then loop(false,1) else tcs.SetResult(local) () else if local % 2 = 0 then local <- local + inc loop(true,1) else list.Add(local) // just fake access to a field to illustrate the pattern local <- local + 1 loop(false,1) loop(true,1) returnTask > Real: 00:00:00.004, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0 > val it : int = 1000001 

reduces the overhead JIT_TailCall and JIT_TailCallHelperStub_ReturnAddress to 18%, and 2% of the lead time - 2 times faster, so the actual overhead decreased from 45% to 10% of the initial time. Still tall, but not as gloomy as in the first scenario.

+1
source

All Articles