More efficient method to get directory size

I already created a recursive function to get the size of the directory in the folder path. It works, however, with the growing number of directories I have to look for (and the number of files in each corresponding folder), this is a very slow, inefficient method.

static string GetDirectorySize(string parentDir) { long totalFileSize = 0; string[] dirFiles = Directory.GetFiles(parentDir, "*.*", System.IO.SearchOption.AllDirectories); foreach (string fileName in dirFiles) { // Use FileInfo to get length of each file. FileInfo info = new FileInfo(fileName); totalFileSize = totalFileSize + info.Length; } return String.Format(new FileSizeFormatProvider(), "{0:fs}", totalFileSize); } 

This looks for all the subdirectories for the argument path, so the dirFiles array becomes quite large. Is there a better way to do this? I searched but found nothing.

Another idea that crossed my mind was to put the results in the cache, and when the function is called again, try to find the differences and only rename the folders that were changed. Not sure if this is good ...

+8
c # directory search recursion
source share
5 answers

First you look at the tree to get a list of all the files. Then you open each file again to get its size. This means that scanning is performed twice.

I suggest you use DirectoryInfo.GetFiles, which will directly pass you FileInfo objects. These objects are pre-filled with their length.

In .NET 4, you can also use the EnumerateFiles method, which will return lazy IEnumable to you.

+24
source share

This is more mysterious, but it took about 2 seconds to complete 10k.

  public static long GetDirectorySize(string parentDirectory) { return new DirectoryInfo(parentDirectory).GetFiles("*.*", SearchOption.AllDirectories).Sum(file => file.Length); } 
+10
source share

Try

  DirectoryInfo DirInfo = new DirectoryInfo(@"C:\DataLoad\"); Stopwatch sw = new Stopwatch(); try { sw.Start(); Int64 ttl = 0; Int32 fileCount = 0; foreach (FileInfo fi in DirInfo.EnumerateFiles("*", SearchOption.AllDirectories)) { ttl += fi.Length; fileCount++; } sw.Stop(); Debug.WriteLine(sw.ElapsedMilliseconds.ToString() + " " + fileCount.ToString()); } catch (Exception Ex) { Debug.WriteLine(Ex.ToString()); } 

It made 700,000 in 70 seconds on the desktop NON-RAID P4. So 10,000 per second. On a server class machine, it should be easy 100,000 + / seconds.

Like usr (+1), EnumerateFile is pre-populated with length.

+10
source share

You can speed up the execution of your function a little by using EnumerateFiles() instead of GetFiles() . At the very least, you will not load the complete list into memory.

If this is not enough, you should make the complex function bigger with threads (one thread in the directory is too much, but there is no general rule).
You can use a fixed number of threads viewing directories from the queue, each thread calculates the directory size and adds to the total. Something like:

  • Get a list of all directories (not files).
  • Create N threads (one per core, for example).
  • Each thread scans the directory and calculates the size.
  • If there is no other directory in the queue, the stream ends.
  • If there is a directory in the queue, it calculates its size, etc.
  • The function terminates when all threads terminate.

You can significantly improve the algorithm covering directory searches in all threads (for example, when a thread parses a directory into which it adds folders to the queue). It will complicate this for you if you see it is too slow (this task was used by Microsoft as an example for a new parallel task library ).

+4
source share
 long length = Directory.GetFiles(@"MainFolderPath", "*", SearchOption.AllDirectories).Sum(t => (new FileInfo(t).Length)); 
-one
source share

All Articles