Why are vectors so small?

What is the meaning of Scala vectors having a branch coefficient of 32 rather than some other number? Would smaller branching factors not allow more structure? Clojure seems to use the same branching factor. Is there anything magical about branching factor 32 that I am missing?

+7
source share
4 answers

This will help if you explain what the branching factor is:

The branching coefficient of a tree or graph is the number of children in each node.

So the answer seems to be mostly here:

http://www.scala-lang.org/docu/files/collections-api/collections_15.html

Vectors are represented as trees with a high branching coefficient. each tree node contains up to 32 vector elements or contains up to 32 other tree nodes. Vectors up to 32 elements long can be represented in one node. Vectors up to 32 * 32 = 1024 elements can be represented by one indirection. Two jumps from the tree root for a finite element node is enough for vectors with an accuracy of 2 15 three jumps for vectors with 2 20 four jumps for vectors with elements 2 25 and five jumps for vectors with elements up to 2 30 . Thus, for all vectors of a reasonable size, the choice of an element includes up to 5 elements of a primitive array. This is what we had in mind when we wrote that access to an element is "effective constant time."

So, basically, they had to make a design decision regarding how many children have to be on each node. As they explained, 32 seemed reasonable, but if you find it too restrictive for you, you can always write your own class.

For more information on why this could be 32, you can look at this article, since in the introduction they make the same statement as above that it is an almost constant time, but this article is about Clojure, more than Scala.

http://infoscience.epfl.ch/record/169879/files/RMTrees.pdf

+13
source
James Black's answer is correct. Another argument in favor of choosing 32 elements could be that the cache line size in many modern processors is 64 bytes, so two lines can contain 32 intervals with 4 bytes each or 32 pointers on a 32-bit machine or a 64-bit JVM with heap size up to 32GB due to pointer compression.
+8
source

This is an "effective constant time" for updates. With such a large branching coefficient, you never have to go beyond 5 levels, even for terabyte scale vectors. Here's a video from Rich about this and other aspects of Clojure on channel 9. http://channel9.msdn.com/Shows/Going+Deep/Expert-to-Expert-Rich-Hickey-and-Brian-Beckman-Inside-Clojure

+4
source

Just add a little James answer.

From the point of view of the analysis of the algorithm http://www.texify.com/img/%5CLARGE%5C%21O%28log%20_b%20%28N%29%29%20%3D%20O%28log%20_k%20%28N%29%29.gif , since the growth of two functions is logarithmic, therefore they are scaled equally.

But, in practical applications, having enter image description here hops is a much smaller number of transitions than, say, base 2, it is enough that it maintains it closer to a constant time even with rather large values ​​of N.

I am sure that they chose 32 exactly (unlike a larger number) due to the size of the memory block, but the main reason is the smaller number of flights compared to the smaller sizes.

I also recommend that you watch this presentation on InfoQ, where Daniel Spivak discusses Vectors starting in about 30 minutes: http://www.infoq.com/presentations/Functional-Data-Structures-in-Scala

+4
source

All Articles