I posted this to a GitHub question , but cross-posted it here to make it easier to find:
I have a case where mutually recursive functions generate 30% load for JIT_TailCall and 15% load for JIT_TailCallHelperStub_ReturnAddress. These functions are closed over method variables and class fields. When I refuse to generate a tail call, my productivity increases by exactly 45%.
I did not profile this fragment, but my real code is structured exactly like this:
#time "on" type MyRecType() = let list = System.Collections.Generic.List() member this.DoWork() = let mutable tcs = (System.Runtime.CompilerServices.AsyncTaskMethodBuilder<int>.Create()) let returnTask = tcs.Task // NB! must access this property first let mutable local = 1 let rec outerLoop() = if local < 1000000 then innerLoop(1) else tcs.SetResult(local) () and innerLoop(inc:int) = if local % 2 = 0 then local <- local + inc outerLoop() else list.Add(local) // just fake access to a field to illustrate the pattern local <- local + 1 innerLoop(inc) outerLoop() returnTask let instance = MyRecType() instance.DoWork().Result > Real: 00:00:00.019, CPU: 00:00:00.031, GC gen0: 0, gen1: 0, gen2: 0 > val it : int = 1000001
.NET 4.6 and F # 4.0 do not help at all.
I tried to rewrite this as methods, but got a StackOverflowException. However, I do not understand why I am not getting SO when I run a very large number of iterations without generating a tail call?
Update Rewriting a method as:
member this.DoWork2() = let mutable tcs = (System.Runtime.CompilerServices.AsyncTaskMethodBuilder<int>.Create()) let returnTask = tcs.Task // NB! must access this property first let mutable local = 1 let rec loop(isOuter:bool, inc:int) = if isOuter then if local < 1000000 then loop(false,1) else tcs.SetResult(local) () else if local % 2 = 0 then local <- local + inc loop(true,1) else list.Add(local) // just fake access to a field to illustrate the pattern local <- local + 1 loop(false,1) loop(true,1) returnTask > Real: 00:00:00.004, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0 > val it : int = 1000001
reduces the overhead JIT_TailCall and JIT_TailCallHelperStub_ReturnAddress to 18%, and 2% of the lead time - 2 times faster, so the actual overhead decreased from 45% to 10% of the initial time. Still tall, but not as gloomy as in the first scenario.