I think you already have answers from #haskell; basically, each writeSTRef comes down to one or two memory notes, which is cheap in this case, since they probably will never go past the level 1 cache.
On the other hand, the branch resulting from if-then-else in fib3 creates two paths that execute sequentially in successive iterations, which is a bad case for many processor branch predictors by adding bubbles to the pipeline. See http://en.wikipedia.org/wiki/Instruction_pipeline .
How about a clean version?
fib0 :: Int -> Integer fib0 = go 0 1 where go :: Integer -> Integer -> Int -> Integer go abn = case n > 0 of True -> go b (a + b) (n - 1) False -> b
This is even faster:
benchmarking fib0 40000 mean: 17.14679 ms, lb 17.12902 ms, ub 17.16739 ms, ci 0.950 std dev: 97.28594 us, lb 82.39644 us, ub 120.1041 us, ci 0.950 benchmarking fib3 40000 mean: 17.32658 ms, lb 17.30739 ms, ub 17.34931 ms, ci 0.950 std dev: 106.7610 us, lb 89.69371 us, ub 126.8279 us, ci 0.950 benchmarking fib4 40000 mean: 18.13887 ms, lb 18.11173 ms, ub 18.16868 ms, ci 0.950 std dev: 145.9772 us, lb 127.6892 us, ub 168.3347 us, ci 0.950
liyang
source share