Currently on Windows, GHC is a 32-bit GHC - I think that 64-bit GHC for Windows should be available when 7.6 arrives.
One of the consequences of this is that on Windows you cannot use more than 4G - 1BLOCK memory, since the maximum allowable parameter is the size of HS_WORD_MAX :
decodeSize(rts_argv[arg], 2, BLOCK_SIZE, HS_WORD_MAX) / BLOCK_SIZE;
With 32-bit words HS_WORD_MAX = 2^32-1 .
It explains
works. / mem.exe 42000000 + RTS -s -M4G errors with -M4G: size out of range
since decodeSize() decodes 4G as 2^32 .
This limitation will remain after updating your GHC until the 64-bit GHC for Windows is released.
As a 32-bit process, the user-mode virtual address space is limited to 2 or 4 GB (depending on the state of the IMAGE_FILE_LARGE_ADDRESS_AWARE flag), cf Memory limits for Windows editions .
Now you are trying to build a Set containing 42 million 4-byte Int s. A Data.Set.Set has five overhead words for each element (constructor, size, left and right subtree pointer, element pointer), so Set will occupy about 0.94 gigabytes of memory (1.008 'metric' GB). But the process uses about half as much or less (garbage collection requires space, at least the size of a live heap).
Running the program on my 64-bit Linux, with an input of 21000000 (to compensate for twice as many Int and pointers), I get
$ ./mem +RTS -s -RTS 21000000 min: 0 max: 21000000 31,330,814,200 bytes allocated in the heap 4,708,535,032 bytes copied during GC 1,157,426,280 bytes maximum residency (12 sample(s)) 13,669,312 bytes maximum slop 2261 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 59971 colls, 0 par 2.73s 2.73s 0.0000s 0.0003s Gen 1 12 colls, 0 par 3.31s 10.38s 0.8654s 8.8131s INIT time 0.00s ( 0.00s elapsed) MUT time 12.12s ( 13.33s elapsed) GC time 6.03s ( 13.12s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 18.15s ( 26.45s elapsed) %GC time 33.2% (49.6% elapsed) Alloc rate 2,584,429,494 bytes per MUT second Productivity 66.8% of total user, 45.8% of total elapsed
but top only tells 1.1g about memory usage - top and, apparently, the task manager, only reports a live heap.
So, it seems that IMAGE_FILE_LARGE_ADDRESS_AWARE not installed, your process is limited to a 2 GB address space, and 42 million Set needs more than this - unless you specify a maximum or suggested heap size that is smaller:
$ ./mem +RTS -s -M1800M -RTS 21000000 min: 0 max: 21000000 31,330,814,200 bytes allocated in the heap 3,551,201,872 bytes copied during GC 1,157,426,280 bytes maximum residency (12 sample(s)) 13,669,312 bytes maximum slop 1154 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 59971 colls, 0 par 2.70s 2.70s 0.0000s 0.0002s Gen 1 12 colls, 0 par 4.23s 4.85s 0.4043s 3.3144s INIT time 0.00s ( 0.00s elapsed) MUT time 11.99s ( 12.00s elapsed) GC time 6.93s ( 7.55s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 18.93s ( 19.56s elapsed) %GC time 36.6% (38.6% elapsed) Alloc rate 2,611,793,025 bytes per MUT second Productivity 63.4% of total user, 61.3% of total elapsed
Setting the maximum heap size below what it will use in a natural way, in fact allows it to accommodate a little more space needed for Set , at the price of a slightly longer GC time and offering a heap size of -H1800M allows only
1831 MB total memory in use (0 MB lost due to fragmentation)
So, if you specify a maximum heap size below 2 GB (but big enough for Set fit), it should work.