Let me compare the actual numbers. Your version of kolakoski uses about 70 thousand. When launched without optimization:
$ ghc --make Kolakoski-Unit && ./Kolakoski-Unit + RTS -s
2
288,002,359,096 bytes allocated in the heap
1,343,933,816 bytes copied during GC
67,576 bytes maximum residency (422 sample (s))
52,128 bytes maximum slop
2 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 551615 colls, 0 par 1.89s 2.30s 0.0000s 0.0001s
Gen 1 422 colls, 0 par 0.02s 0.02s 0.0001s 0.0001s
INIT time 0.00s (0.00s elapsed)
MUT time 37.34s (37.25s elapsed)
GC time 1.91s (2.33s elapsed)
EXIT time 0.00s (0.00s elapsed)
Total time 39.25s (39.58s elapsed)
% GC time 4.9% (5.9% elapsed)
Alloc rate 7,712,197,063 bytes per MUT second
Productivity 95.1% of total user, 94.3% of total elapsed
The optimization uses about ~ 4 GB (although the actual memory usage in the task manager reaches ~ 6 GB).
$ ghc --make Kolakoski-Unit -O && ./Kolakoski-Unit + RTS -s
2
64,000,066,608 bytes allocated in the heap
27,971,527,816 bytes copied during GC
3,899,031,480 bytes maximum residency (34 sample (s))
91,679,728 bytes maximum slop
9549 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 122806 colls, 0 par 8.67s 9.48s 0.0001s 0.0148s
Gen 1 34 colls, 0 par 11.55s 69.78s 2.0524s 56.2970s
INIT time 0.00s (0.00s elapsed)
MUT time 8.80s (8.43s elapsed)
GC time 20.22s (79.26s elapsed)
EXIT time 0.03s (0.89s elapsed)
Total time 29.05s (88.58s elapsed)
% GC time 69.6% (89.5% elapsed)
Alloc rate 7,275,318,406 bytes per MUT second
Productivity 30.4% of total user, 10.0% of total elapsed
If we use a list-based version and do not optimize, memory consumption is very similar to what is included with optimization:
kolakoskiList :: [Int] kolakoskiList = 1 : 2 : helper where helper = 2 : concat (zipWith replicate helper (cycle [1, 2]))
$ ghc --make Kolakoski-List && ./Kolakoski-List + RTS -s
2
96,000,143,328 bytes allocated in the heap
26,615,974,536 bytes copied during GC
3,565,429,808 bytes maximum residency (34 sample (s))
83,610,688 bytes maximum slop
8732 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 184252 colls, 0 par 9.98s 10.16s 0.0001s 0.0006s
Gen 1 34 colls, 0 par 10.45s 21.61s 0.6357s 12.0792s
INIT time 0.00s (0.00s elapsed)
MUT time 12.02s (11.88s elapsed)
GC time 20.44s (31.77s elapsed)
EXIT time 0.05s (0.69s elapsed)
Total time 32.50s (44.34s elapsed)
% GC time 62.9% (71.7% elapsed)
Alloc rate 7,989,608,807 bytes per MUT second
Productivity 37.1% of total user, 27.2% of total elapsed
Now we can check the link to the GHC flag for flags that are automatically set to -O . Since the list version seems to do the same as the optimized one, you might think that the GHC will convert kolakoski to kolakoskiList :
kolakoskiOptimized _ = kolakoskiList
Let me check this in the kernel with -ddump-simpl -dsupress-all :
==================== Tidy Core ==================== Result size of Tidy Core = {terms: 45, types: 30, coercions: 0} kolakoski kolakoski = \ ds_d10r -> case ds_d10r of _ { () -> : (I# 1) (: (I# 2) (letrec { helper_aNo helper_aNo = \ ds1_d10s -> case ds1_d10s of _ { () -> : (I# 2) (concat (zipWith (replicate) (helper_aNo ()) (cycle (: (I# 1) (: (I# 2) ([])))))) }; } in helper_aNo ())) } main main = print $fShowInt (!! (kolakoski ()) (I# 500000000)) main main = runMainIO main
Even if you are not familiar with the GHC core, you can see that kolakoski basically matches your original version. Now compare this to -O -ddump-simpl -dsupress-all :
==================== Tidy Core ==================== Result size of Tidy Core = {terms: 125, types: 102, coercions: 9} kolakoski6 kolakoski6 = I# 1 kolakoski5 kolakoski5 = I# 2 Rec { go_r1NG go_r1NG = \ ds_a14B _ys_a14C -> case ds_a14B of _ { [] -> []; : ipv_a14H ipv1_a14I -> case _ys_a14C of _ { [] -> []; : ipv2_a14O ipv3_a14P -> case ipv_a14H of _ { I# n#_a13J -> case tagToEnum# (<=# n#_a13J 0) of _ { False -> let { lvl2_s1N3 lvl2_s1N3 = : ipv2_a14O ([]) } in letrec { xs_a1LH xs_a1LH = \ m_a1LO -> case tagToEnum# (<=# m_a1LO 1) of _ { False -> : ipv2_a14O (xs_a1LH (-# m_a1LO 1)); True -> lvl2_s1N3 }; } in ++ (xs_a1LH n#_a13J) (go_r1NG ipv1_a14I ipv3_a14P); True -> ++ ([]) (go_r1NG ipv1_a14I ipv3_a14P) } } } } end Rec } lvl_r1NH lvl_r1NH = : kolakoski5 ([]) lvl1_r1NI lvl1_r1NI = : kolakoski6 lvl_r1NH Rec { xs'_r1NJ xs'_r1NJ = ++ lvl1_r1NI xs'_r1NJ end Rec } Rec { kolakoski3 kolakoski3 = : kolakoski5 kolakoski4 kolakoski4 kolakoski4 = go_r1NG kolakoski3 xs'_r1NJ end Rec } kolakoski2 kolakoski2 = : kolakoski5 kolakoski3 kolakoski1 kolakoski1 = : kolakoski6 kolakoski2 kolakoski kolakoski = \ ds_d13p -> case ds_d13p of _ { () -> kolakoski1 }
This version contains several CAF CAFs that are maintained throughout the life of the program. This way you really generate a list of up to 500,000,000 values and save it .
Now what happened there? Something inside your function has surfaced. Let me check the flag link again. There is a promising flag, which implies -O :
-ffull-laziness Turn on complete laziness (floating anchors out). Implied -O .
And this is the flag that leads to your problems. Indeed, you can use ghc --make -O -fno-full-laziness Kolakoski-Unit.hs to get the original memory consumption:
$ ghc --make Kolakoski-Unit.hs -O -fno-full-laziness && ./Kolakoski-Unit + RTS -s
2
192,001,417,688 bytes allocated in the heap
637,962,464 bytes copied during GC
66,104 bytes maximum residency (151 sample (s))
43,448 bytes maximum slop
2 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 368364 colls, 0 par 1.34s 1.32s 0.0000s 0.0002s
Gen 1 151 colls, 0 par 0.00s 0.01s 0.0001s 0.0003s
INIT time 0.00s (0.00s elapsed)
MUT time 27.89s (28.13s elapsed)
GC time 1.34s (1.33s elapsed)
EXIT time 0.00s (0.00s elapsed)
Total time 29.25s (29.46s elapsed)
% GC time 4.6% (4.5% elapsed)
Alloc rate 6,884,084,443 bytes per MUT second
Productivity 95.4% of total user, 94.7% of total elapsed
Related Questions
- How to make CAF not CAF in Haskell?