Firstly, there is no tail recursion, but guarded recursion, aka tail recursion modulo minus .
The reason you get stack overflows is because, as others commented, a bunch of storage. But where? One possible culprit is your use of (++) . Although this is not optimal, using (++) does not necessarily lead to thunk overflow and stack overflow. For example, calling
take 2 $ filter (isPrime primes) [15485860..]
should produce [15485863,15485867] as soon as possible and without. But this is all the same code that uses (++) , right?
The problem is that you have two lists that you call primes . One (at the top level) is infinite, co-recursively created through guarded (rather than tail) recursion. The other ( loop argument) is the final list, constructed by adding each newly found stroke to its end, used for testing.
But when it is used for testing, it does not drag out to the end. If this happened, there would be no SO problem. Runs only for sqrt test number . Thus, (++) thunks accumulate in the past.
When isPrime primes 15485863 , it forces the top level of primes to 3935 , which is 547 primes. The list of internal test samples also consists of 547 primes, of which only the first 19 are forced.
But when you call primes !! 1000000 primes !! 1000000 , out of 1,000,000 primes in the duplicate internal list, only 547 are forced. The rest are all in bogs.
If you added new primes to the end of the testing-primes list only when their square was seen among the candidates, the testing-primes list would always be forced completely or almost to the end, and there would be no fist calling SO. And adding (++) to the end of the forced list is not so bad when the next access causes this list to end and leaves no tricks. (He is still copying the list.)
Of course, the top-level primes list can be used directly, as Thomas M. Dubuison shows in his answer.
But the internal list has its uses. When this is correctly implemented, adding new primes to it only when their square is displayed among the candidates, it can allow your program to work in O(sqrt(n)) space when compiling with optimization.