Before experimenting, I added the following line to project.clj:
:jvm-opts ^:replace [] ; Makes measurements more accurate
The main measurements:
(def a (double-array (range 1000000))) ; 10 is too small for performance measurements (quick-bench (sum-of-squares a)) ; ... Execution time mean : 27.617748 ms ... (quick-bench (sum-of-squares2 a)) ; ... Execution time mean : 1.259175 ms ...
This more or less corresponds to the time difference in the question. Try not to use Java arrays (which are not really idiomatic for Clojure):
(def b (mapv (partial * 1.0) (range 1000000))) ; Persistent vector (quick-bench (sum-of-squares b)) ; ... Execution time mean : 14.808644 ms ...
Almost 2 times faster. Now let's remove the type hints:
(defn sum-of-squares3 "Given a vector v, compute the sum of the squares of elements." [v] (r/fold + (r/map
Runtime has increased only slightly compared to the version with type hints. By the way, the transducers version has very similar performance and is much cleaner:
(defn sum-of-squares3 [v] (transduce (map
Now about the hint of an additional type. We can really optimize the first implementation of sum-of-squares :
(defn square ^double [^double x] (* xx)) (defn sum-of-squares4 "Given a vector v, compute the sum of the squares of elements." [v] (r/fold + (r/map square v))) (quick-bench (sum-of-squares4 b)) ; ... Execution time mean : 12.891831 ms ... (defn pl (^double [] 0.0) (^double [^double x] (+ x)) (^double [^double x ^double y] (+ xy))) (defn sum-of-squares5 "Given a vector v, compute the sum of the squares of elements." [v] (r/fold pl (r/map square v))) (quick-bench (sum-of-squares5 b)) ; ... Execution time mean : 9.441748 ms ...
Note # 1: the type of tooltips for the arguments and the return value of sum-of-squares4 and sum-of-squares5 do not have additional performance benefits.
Note No. 2 . It is usually recommended to start with optimization . The direct version (apply + (map square v)) will have good enough performance for most situations. sum-of-squares2 very far from idiomatic and uses literally the concepts of Clojure. If this is really critical performance code, it is best to implement it in Java and use interop. The code will be much cleaner, despite having two languages. Or even implement it in unmanaged code (C, C ++) and use JNI (not supported, but if implemented correctly, it can give the best performance).