The rule of thumb that I use in these scenarios (i.e. those in which you want one input sequence to create multiple output sequences) is that of the following three desired properties, you can usually only have two:
- Efficiency (move the input sequence only once, thus not holding her head)
- Laziness (produce items only on demand)
- No general altered state
The version in clojure.core selects (1,3), but resets (2), producing the entire section at once. Python and Haskell choose (1,2), although this is not immediately obvious: Doesn't Haskell have a volatile state at all? Well, his lazy appreciation of everything (and not just sequence) means that all expressions are thunks that begin as clean slates and only get written when their meaning is needed; the span implementation, as you say, shares the same thunk span p xs' in both of its output sequences, so depending on what it needs, it first “sends” it to the result of another sequence, performing the action at the distance required to preserve other pleasant properties.
An alternative implementation of Clojure that you linked to choice (2,3), as you noted.
The problem is that for partition-by decreasing (1) or (2) means that you are holding the head of a certain sequence: either the input or one of the outputs. Therefore, if you need a solution in which you can process arbitrarily large sections of arbitrarily large input, you need to select (1,2). There are several ways to do this in Clojure:
- Take the Python approach: return something more like an iterator than seq-seqs, make stronger guarantees about non-mutation, and promise that you can safely pass them several times, etc. etc. If instead of a few seconds you return an iterator of iterators, then consuming elements from any iterator can freely mutate or nullify the other (s). This ensures that consumption occurs in order and that memory can be freed.
- Take the Haskell approach: manually parse everything with a lot of
delay calls, and require the client to call force as many times as necessary to receive the data. This will be much uglier in Clojure and significantly increase the depth of your stack (using this on a non-trivial input will probably delete the stack), but it is theoretically possible. - Write something more Clojure-fragmented (but still quite unusual), having several mutable data objects that are coordinated between output sections, each of which is updated as needed when something is requested from any of them.
I am sure that any of these three approaches is possible, but, frankly, they are all quite difficult and not quite natural. Clojure Sequence abstraction simply does not allow you to create the data structure that you need. My advice: if you need something like this, and the sections can be too large to fit comfortably, you just accept a slightly different format and do a little more accounting yourself: evade the dilemma (1,2,3) without creating multiple output sequentially!
So, instead of ((2 4 6 8) (1 3 5) (10 12) (7)) , which is your output format for something like (partition-by even? [2 4 6 8 1 3 5 10 12 7]) , you can accept a slightly ugly format: ([::key true] 2 4 6 8 [::key false] 1 3 5 [::key true] 10 12 [::key false] 7) . It is not difficult to mine and difficult to consume, although it is a little longer and tiring to write out.
Here is one sensible implementation of a producing function:
(defn lazy-partition-by [f coll] (lazy-seq (when (seq coll) (let [x (first coll) k (fx)] (list* [::key k] x ((fn part [k xs] (lazy-seq (when (seq xs) (let [x (first xs) k' (fx)] (if (= k k') (cons x (part k (rest xs))) (list* [::key k'] x (part k' (rest xs)))))))) k (rest coll)))))))
And here's how to use it, first defining a general reduce-grouped that hides the details of the grouping format, and then an example count-partition-sizes function to display the key and size of each partition without storing any sequences in memory:
(defn reduce-grouped [f init groups] (loop [k nil, acc init, coll groups] (if (empty? coll) acc (if (and (coll? (first coll)) (= ::key (ffirst coll))) (recur (second (first coll)) acc (rest coll)) (recur k (fk acc (first coll)) (rest coll)))))) (defn count-partition-sizes [f coll] (reduce-grouped (fn [k acc _] (if (and (seq acc) (= k (first (peek acc)))) (conj (pop acc) (update-in (peek acc) [1] inc)) (conj acc [k 1]))) [] (lazy-partition-by f coll))) user> (lazy-partition-by even? [2 4 6 8 1 3 5 10 12 7]) ([:user/key true] 2 4 6 8 [:user/key false] 1 3 5 [:user/key true] 10 12 [:user/key false] 7) user> (count-partition-sizes even? [2 4 6 8 1 3 5 10 12 7]) [[true 4] [false 3] [true 2] [false 1]]
Change Looking at it again, I’m not sure that my reduce-grouped much more useful than (reduce f init (map g xs)) , since it does not really give you a clear indication when the key changes. Therefore, if you need to know when the group will change, you will need a smarter abstraction or use my original lazy-partition-by , with nothing to do with a “smart” wrap.