Clojure Performance, how to enter a prompt in r / map

Below, I have 2 functions calculating the sum of the squares of their arguments. The first is good and functional, but 20 times slower than the second. I assume that the r / map does not take advantage of aget to extract elements from a double array, whereas I explicitly do this in function 2.

Is there any way that I can type again or help r / map r / fold execute faster?

(defn sum-of-squares "Given a vector v, compute the sum of the squares of elements." ^double [^doubles v] (r/fold + (r/map #(* % %) v))) (defn sum-of-squares2 "This is much faster than above. Post to stack-overflow to see." ^double [^doubles v] (loop [val 0.0 i (dec (alength v))] (if (neg? i) val (let [x (aget vi)] (recur (+ val (* xx)) (dec i)))))) (def a (double-array (range 10))) (quick-bench (sum-of-squares a)) 

800 ns

 (quick-bench (sum-of-squares2 a)) 

40 ns

+5
source share
2 answers

Why not use areduce :

 (def sum-of-squares3 ^double [^doubles v] (areduce v idx ret 0.0 (let [item (aget v idx)] (+ ret (* item item))))) 

On my car:

 (criterium/bench (sum-of-squares3 (double-array (range 100000)))) 

Gives an average runtime of 1.809103 ms, your sum-of-squares2 does the same calculation in 1.455775 ms. I think this version using areduce more idiomatic than your version.

To compress slightly higher performance, you can try using unverified math ( add-unchecked and multiply-unchecked ). But be careful, you must be sure that your calculation cannot overflow:

 (defn sum-of-squares4 ^double [^doubles v] (areduce v idx ret 0.0 (let [item (aget v idx)] (unchecked-add ret (unchecked-multiply item item))))) 

Running the same test gives an average runtime of 1.144197 ms. Your sum-of-squares2 can also benefit from untested maths with an average runtime of 1.126001 ms.

+1
source

Before experimenting, I added the following line to project.clj:

 :jvm-opts ^:replace [] ; Makes measurements more accurate 

The main measurements:

 (def a (double-array (range 1000000))) ; 10 is too small for performance measurements (quick-bench (sum-of-squares a)) ; ... Execution time mean : 27.617748 ms ... (quick-bench (sum-of-squares2 a)) ; ... Execution time mean : 1.259175 ms ... 

This more or less corresponds to the time difference in the question. Try not to use Java arrays (which are not really idiomatic for Clojure):

 (def b (mapv (partial * 1.0) (range 1000000))) ; Persistent vector (quick-bench (sum-of-squares b)) ; ... Execution time mean : 14.808644 ms ... 

Almost 2 times faster. Now let's remove the type hints:

 (defn sum-of-squares3 "Given a vector v, compute the sum of the squares of elements." [v] (r/fold + (r/map #(* % %) v))) (quick-bench (sum-of-squares3 a)) ; Execution time mean : 30.392206 ms (quick-bench (sum-of-squares3 b)) ; Execution time mean : 15.583379 ms 

Runtime has increased only slightly compared to the version with type hints. By the way, the transducers version has very similar performance and is much cleaner:

 (defn sum-of-squares3 [v] (transduce (map #(* % %)) + v)) 

Now about the hint of an additional type. We can really optimize the first implementation of sum-of-squares :

 (defn square ^double [^double x] (* xx)) (defn sum-of-squares4 "Given a vector v, compute the sum of the squares of elements." [v] (r/fold + (r/map square v))) (quick-bench (sum-of-squares4 b)) ; ... Execution time mean : 12.891831 ms ... (defn pl (^double [] 0.0) (^double [^double x] (+ x)) (^double [^double x ^double y] (+ xy))) (defn sum-of-squares5 "Given a vector v, compute the sum of the squares of elements." [v] (r/fold pl (r/map square v))) (quick-bench (sum-of-squares5 b)) ; ... Execution time mean : 9.441748 ms ... 

Note # 1: the type of tooltips for the arguments and the return value of sum-of-squares4 and sum-of-squares5 do not have additional performance benefits.

Note No. 2 . It is usually recommended to start with optimization . The direct version (apply + (map square v)) will have good enough performance for most situations. sum-of-squares2 very far from idiomatic and uses literally the concepts of Clojure. If this is really critical performance code, it is best to implement it in Java and use interop. The code will be much cleaner, despite having two languages. Or even implement it in unmanaged code (C, C ++) and use JNI (not supported, but if implemented correctly, it can give the best performance).

+7
source

Source: https://habr.com/ru/post/1212036/


All Articles