Beginner Question on Heap and Garbage in Clojure

Question

Beginner Question on Heap and Garbage in Clojure

I have a question about Clojure: I'm trying to learn a language by going through Project Euler , and I don’t understand what is going on under the hood: The following code, intended for use, returns a list of all primes to lim . I think it should be O (n) in the heap space, because I am doing a list of all numbers before lim , and then filtering them one by one, moving the first to a new list. (I know that I actually make new lists, each of which is repeated, but I did not think that they would take more memory?) In any case, I run this with

 (defn getAllPrimes [lim] (defn getPrimes [primes numlist] (if (not-empty numlist) ;base case; (recur (cons (first numlist) primes) ;put the prime on to the prime list (filter (fn [x] (not (div? x (first numlist)))) ;remove the prime and all its multiples from the numlist (rest numlist))) primes)); return the primes (getPrimes () (range 2 lim))) ;call the recursive function with and empty prime list to be filled up and a full numlist to be emptied

And I'm running out of empty space when I call

 (apply + (getAllPrimes 2000000))

but I do not go out of space on

 (apply + (filter even? (range 2000000)))

So, I think I should not understand how garbage lists are collected in calls for repetition and actually use a bunch of O (n * n) or something like that.

+7

heap clojure tail-recursion primes

Benjie holson Aug 10 '10 at 23:31

source share

1 answer

dreish · Accepted Answer · 2010-08-11T00:36:27+0000

I believe the problem is that with each repetition you create a new lazy sequence related to the last one, so after several iterations you hold the section that holds the head of seq, which holds the head of seq, which holds the head of seq .... All intermediate sections fill your heap.

Although writing a simple sieve is a useful exercise if you want an answer, Clojure includes a sequence of primes in the standard library: clojure.contrib.lazy-seqs / primes. The standard solution to this given Euler problem is one-line.

As a style point, internal defn is not a good idea. The practical effect is the same as if defn were at the top level, but if I'm not mistaken, var gets reassigned every time getAllPrimes is called, and overriding vars at runtime is highly discouraged. Because the code simply defines var, getPrimes is still as visible as getAllPrimes. In this case, getPrimes can easily be rewritten as a / recur loop without an internal function, anonymous or named. This does not help to solve the problem with the lazy-seqs chain, but makes the code more standard:

 (defn getAllPrimes [lim] (loop [primes () numlist (range 2 lim)] (if (not-empty numlist) (recur (cons (first numlist) primes) (filter (fn [x] (not (div? x (first numlist)))) (rest numlist))) primes)))

I would also avoid using camelCase. The default Clojure name for this function would be get-all-primes.

Returning to the practical problem, however, the least work you could do to get your code to work was to get each section to go through each iteration, i.e. wrap filter call in doall. I tried this, and while it still runs slowly, it at least ends without ending the heap:

 (defn get-all-primes [lim] (loop [primes () numlist (range 2 lim)] (if (not-empty numlist) (recur (cons (first numlist) primes) (doall (filter #(not (div? % (first numlist))) (rest numlist)))) primes)))

Beginner Question on Heap and Garbage in Clojure

More articles: