This is actually the next question of this question. I managed to get profiling to work, and the problem really seems to be a lazy assessment.
I use the Map Int (Map Int Text) data structure Map Int (Map Int Text) , where Text is from Data.Text . The problem is that the function that builds this map creates a huge chunk. Working on input text of about 3 MB, programs require more than 250 MB of memory.
Now to the real purpose of this question:
To get the number of characters in this data structure, use the following function:
type TextResource = M.Map Int (M.Map Int T.Text) totalSize :: TextResouce -> Int totalSize = M.fold ((+) . (M.fold ((+). T.length) 0)) 0
Not beautiful, but he does his job. I use this function in the main function right after creating the TextResource. Interestingly, when I profile a program using the RTS option -hr or -hc , the memory usage is reduced to 70 or 50 MB after a while, which would be completely normal.
Unfortunately, this only works when using the profiling options and the totalSize function - without returning them up to 250 MB.
I downloaded the program (<70 lines) along with a test file and a bonded file so you can try it yourself: Link
Test.xml is a generated XML file that should be placed in the executable files directory. To build cabal configure --enable-executable-profiling and subsequently cabal build should be enough (if you have profiling versions of installed libraries installed).
You can see the change at program startup once with +RTS -hc and once without it.
It would be great if someone could run the program, since I was really stuck here. I already tried putting deepseq in several places, but nothing works (well, except for using profiling options).
Edit:
Profiling shows, however, that only ~ 20 MB of the heap is used, since in my comment I blame the GHC for not freeing up as much GC child memory as you think.
Thanks, it pointed me in the right direction. As it turns out, you can tell the GHC to perform garbage collection ( performGC ), which works fine after a deep map definition. Although I assume that using performGC is not recommended, this seems to be the right tool to work here.
Edit2: This is how I changed the main function (+ deepseqing return buildTextFile):
main = do tf <- buildTextFile "test.xml" performGC putStrLn . show . text 1 1000 $ tf getLine putStrLn . show . text 100 1000 $ tf return ()