Why inserting 1,000,000 values ​​into a transition map in Clojure gives a map with 8 elements in it?

If I try to make 1,000,000 assoc! on the transition vector, I get a vector of 1,000,000 elements

 (count (let [m (transient [])] (dotimes [i 1000000] (assoc! mii)) (persistent! m))) ; => 1000000 

on the other hand, if I do the same with the map, it will only contain 8 elements

 (count (let [m (transient {})] (dotimes [i 1000000] (assoc! mii)) (persistent! m))) ; => 8 

Is there a reason why this is happening?

+7
collections dictionary vector clojure transient
source share
2 answers

Transitional data type operations do not guarantee that they will return the same link as those that were passed. Sometimes an implementation may decide to return a new (but still transitional) card after assoc! instead of using the one you went through.

ClojureDocs page on assoc! has a good example that explains this behavior:

 ;; The key concept to understand here is that transients are ;; not meant to be `bashed in place`; always use the value ;; returned by either assoc! or other functions that operate ;; on transients. (defn merge2 "An example implementation of `merge` using transients." [xy] (persistent! (reduce (fn [res [kv]] (assoc! res kv)) (transient x) y))) ;; Why always use the return value, and not the original? Because the return ;; value might be a different object than the original. The implementation ;; of Clojure transients in some cases changes the internal representation ;; of a transient collection (eg when it reaches a certain size). In such ;; cases, if you continue to try modifying the original object, the results ;; will be incorrect. ;; Think of transients like persistent collections in how you write code to ;; update them, except unlike persistent collections, the original collection ;; you passed in should be treated as having an undefined value. Only the return ;; value is predictable. 

I would like to repeat this last part because it is very important: the original collection that you passed in should be considered as having undefined value. Only the return value is predictable.

Here's a modified version of your code that works as expected:

 (count (let [m (transient {})] (persistent! (reduce (fn [acc i] (assoc! acc ii)) m (range 1000000))))) 

As a side note, the reason you always get 8 is that Clojure likes to use clojure.lang.PersistentArrayMap (a map supported by an array) for cards with 8 or less elements. Once you finish 8, it will switch to clojure.lang.PersistentHashMap .

 user=> (type '{1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a}) clojure.lang.PersistentArrayMap user=> (type '{1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 a}) clojure.lang.PersistentHashMap 

After you go through 8 entries, your transition card will switch the backup data structure from an array of pairs ( PersistentArrayMap ) to a hash table ( PersistentHashMap ), after which assoc! returns a new link instead of just updating the old one.

+19
source share

The simplest explanation from the Clojure documentation (highlighted by me):

Transients support a parallel set of "shift" operations, with similar names followed by! - assoc !, conj! etc. They perform the same actions as their regular colleagues, except that the return values ​​themselves are temporary. Please note in particular that transients are not intended to be retrieved locally. You must capture and use the return value on the next call.

+5
source share