When executing the Clojure `first` function

I saw the following example in Rich video on http://blip.tv/file/734409 sequences for about 33-36 minutes:

(first "abcd") => \a 

Now he says that it expands to (view):

 (first "abcd") => (first (seq "abcd")) => (first '(\a \b \c \d)) 

Thus, it looks like an O(N) operation because a full copy of the string is being executed. First of all, if a String is immutable, then why is it copied? (Edit: based on the answer, it probably is not, just looked at this path when printing.) Secondly, suppose that first operates on something else in Java, which is modified, for example, a linked list of integers. Should first act in a lazy way (e.g. create a constant sequence first)? Doesn't it make sense to evaluate it right away and save it? It would be some kind of hack that would break a good abstraction, but, I think, quickly do this work. When you call (seq "abcd") , you do not know how it will be used. When you call first on seq , you know what to do. But, when you call first on "abcd" , I think that doing hacker and quick "captures and saving" approaches is better than capturing a sequence and then calling first .

Am I missing something? Did Rick Hickey skip a few steps?

Let me know if I have any questions. Thanks!

+7
performance clojure
source share
3 answers

Other answers are correct, but I will take this chance to point out the interesting effect of the Clojure philosophy of immutability.

Thus, it looks like an O (N) operation because a full copy of the string is being executed.

Rows, like other data structures in Clojure, are immutable. (This is a special case implemented in the JVM, not Clojure, but inconsequential up to this point.)

Copies of immutable objects are free. That's right, free. Because you do not need to copy at all. The same selected object in memory can simply be reused, because it must always be the same as when it was “copied”.

Thus, a function like seq should never copy anything. It just works on what is passed directly, and returns a (possibly lazy) sequence that provides an abstract interface to what you called seq on.

So seq always O (1).

+4
source share

It does not follow that a full copy of the string is running. (But this is a good question.)

In the clojure.lang.RT source , you will notice that the runtime uses charsequence to create a sequence:

 static ISeq seqFrom(Object coll){ if(coll instanceof Seqable) return ((Seqable) coll).seq(); else if(coll == null) return null; else if(coll instanceof Iterable) return IteratorSeq.create(((Iterable) coll).iterator()); else if(coll.getClass().isArray()) return ArraySeq.createFromObject(coll); else if(coll instanceof CharSequence) return StringSeq.create((CharSequence) coll); . . . 

So this is a Java question, not a clojure question. I have not tested, but I am pretty sure that CharSequence is doing the “right thing”.

+6
source share

You can look at seq as if it were creating a lazy string-based sequence.

This does not actually create a lazy sequence, it is actually a short circuit to the CharSequence Java implementation, but it is an implementation detail.

In terms of complexity, first is O(1) on seq , and you can create seq in constant time from anything iterable.

+3
source share

All Articles