Clojure reducer library performance tuning

Why is display / reduction using a reducer library worse than a normal map / reduction?

user=> (time (reduce + (map inc (range 100000000)))) "Elapsed time: 14412.627966 msecs" 5000000050000000 user=> (time (reduce + (r/map inc (range 100000000)))) ... (Cc) user=> (time (r/reduce + (r/map inc (range 100000000)))) ....(Cc) 

I had two to kill the last two, because it lasts forever. What is wrong here?

EDIT: It seems that other languages ​​also have similar problems. Scala only seems to be broken into a million. Why do Scala concurrent collections sometimes raise an OutOfMemoryError? . While clojure gearboxes are faster than usual by a million.

+6
source share
3 answers

To complement @ a-webb's answer, this is a compiler error and one involved, which is really fixed. ( See this post for more details. )

Another way to solve the problem is to use a fuse:

 (defn once-seq "Returns a sequence on which seq can be called only once." [coll] (let [a (atom coll)] (reify clojure.lang.Seqable (seq [_] (let [coll @a] (reset! a nil) (seq coll)))))) 

and then:

 => (time (r/reduce + (r/map inc (once-seq (range 100000000))))) "Elapsed time: 3692.765 msecs" 5000000050000000 
+6
source

Performance is interrupted when memory is exhausted. If you keep waiting, most likely you will encounter a memory error. Creating a gear keeps the collection head in close. Thus, a huge lazy sequence links memory as it is realized.

That's what happens distilled

 user=> (def n 100000000) #'user/n user=> (time (dorun (range n))) "Elapsed time: 12349.034702 msecs" 

Now the same thing, but from within the closure

 user=> (defn foo [x] (fn [f] (fx))) #'user/foo user=> (time ((foo (range n)) dorun)) OutOfMemoryError GC overhead limit exceeded ... (sometime later) 

Compare with

 (time (do (doall (range n)) nil)) OutOfMemoryError GC overhead limit exceeded ... (sometime later) 

Suspicious circuit in gearboxes

 user=> (source r/folder) (defn folder "Given a foldable collection, [...]" {:added "1.5"} ([coll xf] (reify clojure.core.protocols/CollReduce (coll-reduce [_ f1] (clojure.core.protocols/coll-reduce coll (xf f1) (f1))) ... 

Christophe Grand has a good post on how to make gears in a lazy fashion.

+2
source

Reducers don't work very well with lazy lists, while normal shortening does.

To get real benefits from gearboxes, you need a lazy collection, for example. vector, and you need to use fold instead of decrement.

  (def v (into [] (range 10000000))) (time (reduce + (map inc v))) ;; "Elapsed time: 896.79 msecs" (time (reduce + (r/map inc v))) ;; "Elapsed time: 671.947 msecs" (time (r/fold + (r/map inc v))) ;; "Elapsed time: 229.45 msecs" 

Gearboxes work with fork / join frameworks, which require large chunks of data. In a lazy sequence (chunked) you don't have these large chunks, so fork / join may not work well.

Rich Hickey talks about gearboxes that explain concepts well: https://vimeo.com/45561411

+1
source

All Articles