Clojure Collection Operations

Question

Clojure Collection Operations

I am new to Clojure, although I am familiar with functional languages, mainly Scala.

I'm trying to figure out what the idiomatic way of working with collections in Clojure is. I am particularly confused by the behavior of functions such as map .

Scala pays great attention to ensuring that map always returns a collection of the same type as the original collection, if that makes sense:

 List(1, 2, 3) map (2 *) == List(2, 4, 6) Set(1, 2, 3) map (2 *) == Set(2, 4, 6) Vector(1, 2, 3) map (2 *) == Vector(2, 4, 6)

Instead, in Clojure, as I understand it, most operations, such as map or filter , are lazy, even if they are called in impatient data structures. This has a weird result.

 (map #(* 2 %) [1 2 3])

lazy list instead of vector.

While I prefer, in general, lazy operations, I find this confusing. In fact, vectors guarantee certain performance characteristics that are not listed.

Let's say I use the result above and add the end. If I understand correctly, the result is not evaluated until I try to add it, then it will be evaluated, and I will get a list instead of a vector; so I have to go through it to add to the end. Of course, I could turn it into a vector afterwards, but it becomes messy and can be missed.

If I understand correctly, map is polymorphic, and it will not be implemented so that it returns a vector to vectors, a list in lists, a stream on streams (this time with lazy semantics), and so on. I think I'm missing something about the basic design of Clojure and its idioms.

What is the reason that basic operations on Clojure data structures do not prevail over the structure?

+7

collections clojure lazy-evaluation

Andrea Jan 2 '13 at 17:02

source share

1 answer

Michiel borkent · Accepted Answer · 2013-01-02T17:47:06+0000

In Clojure, many functions are based on the Seq abstraction. The advantage of this approach is that you do not need to write a function for every other type of collection - as long as your collection can be considered as a sequence (things with a head and, possibly, a tail), you can use it with all seq functions. Functions that perform seqs and output seqs are much more complex and therefore reused than functions that restrict their use to a particular type of collection. When writing your own function in seq you do not need to handle special cases, for example: if the user gives me a vector, I have to return the vector, etc. Your function will be as good in the seq pipeline as any other seq.

The reason the card returns lazy seq is the design choice. Clojure uses lazyness by default for many of these functional constructs. If you want to have other behavior, for example parallelism without intermediate collections, look at the reducers library: http://clojure.com/blog/2012/05/08/reducers-a-library-and-model-for-collection-processing.html

In terms of performance, the card should always apply the function n times in the collection from the first to the last item, so its performance will always be O (n) or worse. In this case, the vector or list does not matter. The possible benefit that laziness will give you is when you use only the first part of the list. If you have to add something to the end of the map output, the vector is really more efficient. You can use mapv (added in Clojure 1.4) in this case: it takes in the collection and displays the vector. I would say only worry about this performance optimization if you have a very good reason for this. In most cases, this is not worth it.

Read more about seq abstraction here: http://clojure.org/sequences

Another higher-order vector return function added in Clojure 1.4 is filterv .

Clojure Collection Operations

More articles: