Idiomatic way of using while maintaining high performance

Question

Idiomatic way of using while maintaining high performance

I have a card that is sorted by its keys, which contains such data:

(def h {50 Text1 70 Text2 372 Text1 391 Text2 759 Text1 778 Text2 })

The map is sorted by keys. The key (number) can be interpreted as the position in which the corresponding value was found in a large block of text. In the above example, “Text1” was found at position 50 in the text.

Now I want to find all the Texts that were found in k positions of each other. I define the function a as follows:

  (defn nearest [mk] (for [m1 (keys m) m2 (keys m) :when (and (> m2 m1) (not= (m m1) (m m2)) (< (- m2 m1) k))] [m1 (get m m1) m2 (get m m2)])) (nearest h 50) ; [[50 "Text1" 70 "Text2"] [372 "Text1" 391 "Text2"] [759 "Text1" 778 "Text2"]]

This works, but too slowly, when the map m contains 100 thousand elements. Because the for loop actually scans all pairs of elements on the map. Since the map is sorted, for each element on the map there is no need to check additional elements as soon as the next element already exceeds k characters. I was able to write a version using loop and recur. But this is unclear. Is there a more natural way to do this using? I assume that (: while) should do the trick, but could not find a way.

 (defn nearest-quick [mk] (let [m1 (keys m) m2 (keys m)] (loop [inp m res [] i (first m1) m1 (rest m1) j (first m2) m2 (rest m2)] (cond (nil? i) res (nil? j)(recur inp res (first m1) (rest m1) j m2) (= ij) (recur inp res i m1 (first m2) (rest m2)) (< ji) (recur inp res i m1 (first m2) (rest m2)) (= (inp i) (inp j)) (recur inp res i m1 (first m2) (rest m2)) (< (- ji) k) (recur inp (conj res [i (inp i) j (inp j)]) i m1 (first m2) (rest m2)) (>= (- ji) k) (recur inp res (first m1) (rest m1) (first (rest m1)) (rest (rest m1)))))))

Note: with a card with 42K elements, the first version takes 90 minutes and the second takes 3 minutes.

+4

functional-programming lisp clojure

TIM S Apr 24 '13 at 15:04

source share

3 answers

Clojure for also has a :while modifier, so you can stop the iteration with the condition.

+2

Michiel borkent Apr 24 '13 at 15:51

source share

From all that I understood from your example:

 (def h (sorted-map 50 "Text1" 70 "Text2" 372 "Text1" 391 "Text2" 759 "Text1" 778 "Text2")) (->> (map #(-> [%1 %2]) h (rest h)) (filter (fn [[[ab] [xy]]] (< (- xa) 50))) (map flatten))

0

Ankur Apr 24 '13 at 15:43

source share

kotarak · Accepted Answer · 2013-04-24T15:26:45+0000

One could use subseq when the map is a sorted map.

 (defn nearest [mn] (for [[kv] m [nk nv] (subseq m < k < (+ kn)) :when (not= v nv)] [kv nk nv]))

The code is not tested.

Idiomatic way of using while maintaining high performance

More articles: