Folder Size for Computing

I am trying to calculate folder size in parallel. Perhaps this is a naive approach. What I do is that I give the calculation of each branch node (directory) to the agent. All leaf nodes have file sizes added to my size. Well, that won't work. :)

'scan' works fine, serial. 'pscan' prints only files from the first level.

(def agents (atom [])) (def my-size (atom 0)) (def root-dir (clojure.java.io/file "/")) (defn scan [listing] (doseq [f listing] (if (.isDirectory f) (scan (.listFiles f)) (swap! my-size #(+ % (.length f)))))) (defn pscan [listing] (doseq [f listing] (if (.isDirectory f) (let [a (agent (.listFiles f))] (do (swap! agents #(conj % a)) (send-off a pscan) (println (.getName f)))) (swap! my-size #(+ % (.length f)))))) 

Do you have any idea what I did wrong?

Thanks.

+4
source share
2 answers

So counting parallel files should be so simple?

Is not:)

I tried to solve this problem better. I realized that I was doing blocking I / O , so pmap does not do this work. I thought that maybe giving meaning to pieces of directories (branches) to agents for processing it would make sense independently. It seems like it does :) Well, I haven't tested it yet.

This works, but there may be some problems with symbolic links on UNIX-like systems.

 (def user-dir (clojure.java.io/file "/home/janko/projects/")) (def root-dir (clojure.java.io/file "/")) (def run? (atom true)) (def *max-queue-length* 1024) (def *max-wait-time* 1000) ;; wait max 1 second then process anything left (def *chunk-size* 64) (def queue (java.util.concurrent.LinkedBlockingQueue. *max-queue-length* )) (def agents (atom [])) (def size-total (atom 0)) (def a (agent [])) (defn branch-producer [node] (if @run? (doseq [f node] (when (.isDirectory f) (do (.put queue f) (branch-producer (.listFiles f))))))) (defn producer [node] (future (branch-producer node))) (defn node-consumer [node] (if (.isFile node) (.length node) 0)) (defn chunk-length [] (min (.size queue) *chunk-size*)) (defn compute-sizes [a] (doseq [i (map (fn [f] (.listFiles f)) a)] (swap! size-total #(+ % (apply + (map node-consumer i)))))) (defn consumer [] (future (while @run? (when-let [size (if (zero? (chunk-length)) false (chunk-length))] ;appropriate size of work (binding [a (agent [])] (dotimes [_ size] ;give us all directories to process (when-let [item (.poll queue)] (set! a (agent (conj @a item))))) (swap! agents #(conj % a)) (send-off a compute-sizes)) (Thread/sleep *max-wait-time*))))) 

You can run it by typing

  (producer (list user-dir)) (consumer) 

For result type

  @size-total 

You can stop it (there are futures - correct me if I am wrong)

  (swap! run? not) 

If you find errors or errors, you can share your ideas!

+3
source

No need to maintain state using atoms. Pure functionality:

 (defn psize [f] (if (.isDirectory f) (apply + (pmap psize (.listFiles f))) (.length f))) 
+15
source

All Articles