I have a card that is sorted by its keys, which contains such data:
(def h {50 Text1 70 Text2 372 Text1 391 Text2 759 Text1 778 Text2 })
The map is sorted by keys. The key (number) can be interpreted as the position in which the corresponding value was found in a large block of text. In the above example, βText1β was found at position 50 in the text.
Now I want to find all the Texts that were found in k positions of each other. I define the function a as follows:
(defn nearest [mk] (for [m1 (keys m) m2 (keys m) :when (and (> m2 m1) (not= (m m1) (m m2)) (< (- m2 m1) k))] [m1 (get m m1) m2 (get m m2)])) (nearest h 50) ; [[50 "Text1" 70 "Text2"] [372 "Text1" 391 "Text2"] [759 "Text1" 778 "Text2"]]
This works, but too slowly, when the map m contains 100 thousand elements. Because the for loop actually scans all pairs of elements on the map. Since the map is sorted, for each element on the map there is no need to check additional elements as soon as the next element already exceeds k characters. I was able to write a version using loop and recur. But this is unclear. Is there a more natural way to do this using? I assume that (: while) should do the trick, but could not find a way.
(defn nearest-quick [mk] (let [m1 (keys m) m2 (keys m)] (loop [inp m res [] i (first m1) m1 (rest m1) j (first m2) m2 (rest m2)] (cond (nil? i) res (nil? j)(recur inp res (first m1) (rest m1) j m2) (= ij) (recur inp res i m1 (first m2) (rest m2)) (< ji) (recur inp res i m1 (first m2) (rest m2)) (= (inp i) (inp j)) (recur inp res i m1 (first m2) (rest m2)) (< (- ji) k) (recur inp (conj res [i (inp i) j (inp j)]) i m1 (first m2) (rest m2)) (>= (- ji) k) (recur inp res (first m1) (rest m1) (first (rest m1)) (rest (rest m1)))))))
Note: with a card with 42K elements, the first version takes 90 minutes and the second takes 3 minutes.
TIM S source share