Why does deep stack usage cause superlinear time behavior for a simple interpreter?

type Expr =
    | Lit of int
    | Add of Expr * Expr

let rec intr = function
    | Lit _ as x -> x
    | Add(Lit a,Lit b) -> Lit <| a + b
    | Add(a,b) -> intr <| Add(intr a, intr b)

let rec intr_cps x ret =
    match x with
    | Lit _ as x -> ret x
    | Add(Lit a,Lit b) -> Lit (a + b) |> ret
    | Add(a,b) -> 
        intr_cps a <| fun a ->
            intr_cps b <| fun b ->
                intr_cps (Add(a, b)) ret

let rec add n =
    if n > 1 then Add(Lit 1, add (n-1))
    else Lit 1

open System.Threading

let mem = 1024*1024*512 // ~536mb
// It stack overflows without being spun on a separate thread.
// By default, the program only has a few mb of stack memory at its disposal.
let run f = Thread(ThreadStart f,mem).Start() 

run <| fun _ ->
    let f n =
        let x = add n
        let stopwatch = System.Diagnostics.Stopwatch.StartNew()
        printfn "%A" (intr x)
        printfn "n_%i_std = %A" n stopwatch.Elapsed

        stopwatch.Restart()
        printfn "%A" (intr_cps x id)
        printfn "n_%i_cps = %A" n stopwatch.Elapsed
    f <| 1000*1000/2
    f <| 1000*1000
    f <| 1000*1000*2

//Lit 500000
//n_500000_std = 00:00:00.7764730
//Lit 500000
//n_500000_cps = 00:00:00.0800371
//Lit 1000000
//n_1000000_std = 00:00:02.9531043
//Lit 1000000
//n_1000000_cps = 00:00:00.1941828
//Lit 2000000
//n_2000000_std = 00:00:13.7823780
//Lit 2000000
//n_2000000_cps = 00:00:00.2767752

I have a much larger interpreter whose behavior I try to better understand, so I did the above. I'm definitely sure that the superlinear time scaling that I see in it for some examples is related to how it uses the stack, but I'm not sure why this happens, so I wanted to ask here.

As you can see, as I change nto 2x, the time changes a lot more, and it looks like the scaling is exponential, which is surprising to me. It is also surprising that the CPS'd interpreter is faster than stack based. Why is this?

, , .NET C?

+6
2

, , , .

let data = add n

( add n data ), CPS .

, , , .

+6

- , GC. , , - 32- , :

Lit 500000
n_500000_std = 00:00:00.3964533
Lit 500000
n_500000_cps = 00:00:00.0945109
Lit 1000000
n_1000000_std = 00:00:01.6021848
Lit 1000000
n_1000000_cps = 00:00:00.2143892
Lit 2000000
n_2000000_std = 00:00:08.0540017
Lit 2000000
n_2000000_cps = 00:00:00.3823931

, 2 64- . CPS'd PerfView tool. , GC.

CommandLine: "IntepreterBenchmark.exe"
Runtime Version: V 4.0.30319.0 (built on 6/6/2017 10:30:00 PM)
CLR Startup Flags: CONCURRENT_GC
Total CPU Time: 19,306 msec
Total GC CPU Time: 17,436 msec
Total Allocs : 202.696 MB
GC CPU MSec/MB Alloc : 86.020 MSec/MB
Total GC Pause: 17,421.9 msec
% Time paused for Garbage Collection: 90.2%
% CPU Time spent Garbage Collecting: 90.3%

. , GC , , .NET, GC , , .

32- . PerfView 64- 15 .

, 32- 2 , , 2x 64- .

0

All Articles