Is there a faster way to scan through a directory recursively in .NET?

I am writing a directory scanner in .NET.

For each / Dir file, I need the following information.

class Info { public bool IsDirectory; public string Path; public DateTime ModifiedDate; public DateTime CreatedDate; } 

I have this function:

  static List<Info> RecursiveMovieFolderScan(string path){ var info = new List<Info>(); var dirInfo = new DirectoryInfo(path); foreach (var dir in dirInfo.GetDirectories()) { info.Add(new Info() { IsDirectory = true, CreatedDate = dir.CreationTimeUtc, ModifiedDate = dir.LastWriteTimeUtc, Path = dir.FullName }); info.AddRange(RecursiveMovieFolderScan(dir.FullName)); } foreach (var file in dirInfo.GetFiles()) { info.Add(new Info() { IsDirectory = false, CreatedDate = file.CreationTimeUtc, ModifiedDate = file.LastWriteTimeUtc, Path = file.FullName }); } return info; } 

Turns out this implementation is pretty slow. Is there any way to speed this up? I am thinking of manually encoding this with FindFirstFileW, but I would like to avoid this if there is a built-in method that is faster.

+24
c # filesystems
Apr 7 '09 at 4:33
source share
8 answers

This implementation, which needs some tweaking, is 5-10X faster.

  static List<Info> RecursiveScan2(string directory) { IntPtr INVALID_HANDLE_VALUE = new IntPtr(-1); WIN32_FIND_DATAW findData; IntPtr findHandle = INVALID_HANDLE_VALUE; var info = new List<Info>(); try { findHandle = FindFirstFileW(directory + @"\*", out findData); if (findHandle != INVALID_HANDLE_VALUE) { do { if (findData.cFileName == "." || findData.cFileName == "..") continue; string fullpath = directory + (directory.EndsWith("\\") ? "" : "\\") + findData.cFileName; bool isDir = false; if ((findData.dwFileAttributes & FileAttributes.Directory) != 0) { isDir = true; info.AddRange(RecursiveScan2(fullpath)); } info.Add(new Info() { CreatedDate = findData.ftCreationTime.ToDateTime(), ModifiedDate = findData.ftLastWriteTime.ToDateTime(), IsDirectory = isDir, Path = fullpath }); } while (FindNextFile(findHandle, out findData)); } } finally { if (findHandle != INVALID_HANDLE_VALUE) FindClose(findHandle); } return info; } 

extension method:

  public static class FILETIMEExtensions { public static DateTime ToDateTime(this System.Runtime.InteropServices.ComTypes.FILETIME filetime ) { long highBits = filetime.dwHighDateTime; highBits = highBits << 32; return DateTime.FromFileTimeUtc(highBits + (long)filetime.dwLowDateTime); } } 

interop defs:

  [DllImport("kernel32.dll", CharSet = CharSet.Unicode, SetLastError = true)] public static extern IntPtr FindFirstFileW(string lpFileName, out WIN32_FIND_DATAW lpFindFileData); [DllImport("kernel32.dll", CharSet = CharSet.Unicode)] public static extern bool FindNextFile(IntPtr hFindFile, out WIN32_FIND_DATAW lpFindFileData); [DllImport("kernel32.dll")] public static extern bool FindClose(IntPtr hFindFile); [StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode)] public struct WIN32_FIND_DATAW { public FileAttributes dwFileAttributes; internal System.Runtime.InteropServices.ComTypes.FILETIME ftCreationTime; internal System.Runtime.InteropServices.ComTypes.FILETIME ftLastAccessTime; internal System.Runtime.InteropServices.ComTypes.FILETIME ftLastWriteTime; public int nFileSizeHigh; public int nFileSizeLow; public int dwReserved0; public int dwReserved1; [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)] public string cFileName; [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)] public string cAlternateFileName; } 
+36
Apr 7 '09 at 5:00
source share

Depending on how long you try to hide this function, it may be useful to call the Win32 API functions right away, because the existing API does a lot of extra processing to verify that you cannot be interested.

If you haven’t done this yet and think that you are not going to contribute to the Mono project, I highly recommend downloading Reflector and see how Microsoft has implemented the API calls you are currently using. This will give you an idea of ​​what you need to name and what you can leave.

You can, for example, choose an iterator that instead of the names yield , instead of a function that returns a list, so you do not complete repeating the same list of names two or three times through all the different levels of code.

+5
Apr 7 '09 at 4:50
source share

There is a long history of the slow process of listing .NET files. The problem is that there is no instant way to list large directory structures. Even the accepted answer here has problems with GC distribution.

The best I managed to do was completed in my library and shown as FileFile ( source ) in the CSharpTest.Net.IO namespace . This class can list files and folders without unnecessary GC allocations and line marshaling.

The use is quite simple, and the RaiseOnAccessDenied property will skip directories and files that the user does not have access to:

  private static long SizeOf(string directory) { var fcounter = new CSharpTest.Net.IO.FindFile(directory, "*", true, true, true); fcounter.RaiseOnAccessDenied = false; long size = 0, total = 0; fcounter.FileFound += (o, e) => { if (!e.IsDirectory) { Interlocked.Increment(ref total); size += e.Length; } }; Stopwatch sw = Stopwatch.StartNew(); fcounter.Find(); Console.WriteLine("Enumerated {0:n0} files totaling {1:n0} bytes in {2:n3} seconds.", total, size, sw.Elapsed.TotalSeconds); return size; } 

For my local C: \ drive, this outputs the following:

Listed 810,046 files for a total of 307,707,792,662 bytes in 232,876 seconds.

Your mileage may vary depending on speed, but this is the fastest method I have found for listing files in managed code. The event parameter is a mutating class of type FindFile.FileFoundEventArgs , so do not forget to refer to it, since the values ​​will change for each event raised.

You may also notice that DateTime is exposed only in UTC. The reason is that switching to local time is half-cost. You can use UTC time to improve performance, rather than converting it to local time.

+5
Sep 17 '12 at 18:01
source share

Its rather small 371 dir with an average of 10 files in each directory. some channels contain other auxiliary devices

This is just a comment, but your numbers look pretty high. I spent below using essentially the same recursive method that you are using, and my times are much lower, despite creating the output of the string.

  public void RecurseTest(DirectoryInfo dirInfo, StringBuilder sb, int depth) { _dirCounter++; if (depth > _maxDepth) _maxDepth = depth; var array = dirInfo.GetFileSystemInfos(); foreach (var item in array) { sb.Append(item.FullName); if (item is DirectoryInfo) { sb.Append(" (D)"); sb.AppendLine(); RecurseTest(item as DirectoryInfo, sb, depth+1); } else { _fileCounter++; } sb.AppendLine(); } } 

I ran the above code in several different directories. On my machine, the 2nd call to scan the directory tree was usually faster because of caching by either the runtime or the file system. Please note that this system is not something special, just a 1yr development workstation.

 // cached call
 Dirs = 150, files = 420, max depth = 5
 Time taken = 53 milliseconds

 // cached call
 Dirs = 1117, files = 9076, max depth = 11
 Time taken = 433 milliseconds

 // first call
 Dirs = 1052, files = 5903, max depth = 12
 Time taken = 11921 milliseconds

 // first call
 Dirs = 793, files = 10748, max depth = 10
 Time taken = 5433 milliseconds (2nd run 363 milliseconds)

With concern that I did not receive the creation and modification date, the code was modified to display this also with the following points.

 // now grabbing last update and creation time.
 Dirs = 150, files = 420, max depth = 5
 Time taken = 103 milliseconds (2nd run 93 milliseconds)

 Dirs = 1117, files = 9076, max depth = 11
 Time taken = 992 milliseconds (2nd run 984 milliseconds)

 Dirs = 793, files = 10748, max depth = 10
 Time taken = 1382 milliseconds (2nd run 735 milliseconds)

 Dirs = 1052, files = 5903, max depth = 12
 Time taken = 936 milliseconds (2nd run 595 milliseconds)

Note. The System.Diagnostics.StopWatch class is used for synchronization.

+2
Apr 7 '09 at 21:59
source share

I just came across this. Good implementation of the native version.

This version, although slower than the version using FindFirst and FindNext , is slightly faster than the original version of .NET.

  static List<Info> RecursiveMovieFolderScan(string path) { var info = new List<Info>(); var dirInfo = new DirectoryInfo(path); foreach (var entry in dirInfo.GetFileSystemInfos()) { bool isDir = (entry.Attributes & FileAttributes.Directory) != 0; if (isDir) { info.AddRange(RecursiveMovieFolderScan(entry.FullName)); } info.Add(new Info() { IsDirectory = isDir, CreatedDate = entry.CreationTimeUtc, ModifiedDate = entry.LastWriteTimeUtc, Path = entry.FullName }); } return info; } 

It should produce the same result as your native version. My testing shows that this version takes about 1.7 times the version using FindFirst and FindNext . Dates received in release mode without using a debugger.

Curiously, changing GetFileSystemInfos to EnumerateFileSystemInfos adds about 5% to the runtime in my tests. I rather expected it to work at the same speed, or perhaps faster, because it does not need to create an array of FileSystemInfo objects.

The following code is shorter as it allows the Framework to take care of recursion. But this is 15-20% slower than the version above.

  static List<Info> RecursiveScan3(string path) { var info = new List<Info>(); var dirInfo = new DirectoryInfo(path); foreach (var entry in dirInfo.EnumerateFileSystemInfos("*", SearchOption.AllDirectories)) { info.Add(new Info() { IsDirectory = (entry.Attributes & FileAttributes.Directory) != 0, CreatedDate = entry.CreationTimeUtc, ModifiedDate = entry.LastWriteTimeUtc, Path = entry.FullName }); } return info; } 

Again, if you change the value to GetFileSystemInfos , it will be slightly (but only slightly) faster.

For my purposes, the first solution above is fast enough. The native version works in about 1.6 seconds. The version using DirectoryInfo starts in about 2.9 seconds. I believe that if I performed these checks very often, I would change my mind.

+2
Apr 20 2018-12-12T00: 00Z
source share

I would use or build on this multi-threaded library: http://www.codeproject.com/KB/files/FileFind.aspx

+1
Apr 07 '09 at 5:49
source share

try this (for example, initialize first, and then reuse your list and objects in your directory):

  static List<Info> RecursiveMovieFolderScan1() { var info = new List<Info>(); var dirInfo = new DirectoryInfo(path); RecursiveMovieFolderScan(dirInfo, info); return info; } static List<Info> RecursiveMovieFolderScan(DirectoryInfo dirInfo, List<Info> info){ foreach (var dir in dirInfo.GetDirectories()) { info.Add(new Info() { IsDirectory = true, CreatedDate = dir.CreationTimeUtc, ModifiedDate = dir.LastWriteTimeUtc, Path = dir.FullName }); RecursiveMovieFolderScan(dir, info); } foreach (var file in dirInfo.GetFiles()) { info.Add(new Info() { IsDirectory = false, CreatedDate = file.CreationTimeUtc, ModifiedDate = file.LastWriteTimeUtc, Path = file.FullName }); } return info; } 
0
Apr 07 '09 at 4:41
source share

Lately, I have got the same question, I think it is also good to output all folders and files to a text file, and then use streamreader to read a text file, do what you want to process using multi-threaded.

 cmd.exe /u /c dir "M:\" /s /b >"c:\flist1.txt" 

[update] Hello Mob, you're right. My approach is slower due to the overhead of reading the output text file. In fact, I spent some time checking the correct answer and cmd.exe with 2 million files.

 The top answer: 2010100 files, time: 53023 cmd.exe method: 2010100 files, cmd time: 64907, scan output file time: 19832. 

The top answer method (53023) is faster than cmd.exe (64907), not to mention how to improve reading of the text output file. Although my initial point is to give a not-so-bad answer, it's still a pity, ha.

0
Aug 07 '14 at 14:13
source share



All Articles