Fast (low level) method for recursively processing files in folders

My applications index the contents of all hard drives on end-user computers. I use Directory.GetFiles and Directory.GetDirectories to recursively process the entire folder structure. I index only a few selected file types (up to 10 file types).

I see in the profiler that most of the indexing time is spent on listing files and folders - depending on the ratio of files that will actually be indexed up to 90% of the time.

I would like to do indexing as quickly as possible. I have already optimized the indexing and processing of indexed files themselves.

I thought using Win32 API calls, but I really see in the profiler that most of the processing time is actually spent on these API calls made by .NET.

Is there a way (possibly low level) available from C # that will make listing files / folders at least partially faster?


As stated in the comment, my current code (only the circuit with inappropriate details is cut off):

private IEnumerable<IndexedEntity> RecurseFolder(string indexedFolder) { //for a single extension: string[] files = Directory.GetFiles(indexedFolder, extensionFilter); foreach (string file in files) { yield return ProcessFile(file); } foreach (string directory in Directory.GetDirectories(indexedFolder)) { //recursively process all subdirectories foreach (var ie in RecurseFolder(directory)) { yield return ie; } } } 
+4
source share
2 answers

.NET 4.0 has built -in enumerated file enumeration methods ; since it is not far, I would try to use it. This can be a factor, especially if you have folders that are massively populated (requires a large distribution of the array).

If the problem is with depth, I would consider smoothing your method to use a local stack / queue and one iterator block. This will reduce the code path used to list deep folders:

  private static IEnumerable<string> WalkFiles(string path, string filter) { var pending = new Queue<string>(); pending.Enqueue(path); string[] tmp; while (pending.Count > 0) { path = pending.Dequeue(); tmp = Directory.GetFiles(path, filter); for(int i = 0 ; i < tmp.Length ; i++) { yield return tmp[i]; } tmp = Directory.GetDirectories(path); for (int i = 0; i < tmp.Length; i++) { pending.Enqueue(tmp[i]); } } } 

Iterate by creating your ProcessFile from the results.

+2
source

If you think the .NET implementation is causing the problem, I suggest you use the winapi calls _findfirst, _findnext, etc.

It seems to me that .NET requires a lot of memory, because lists are completely copied to arrays at each directory level - therefore, if your directory structure has 10 levels, you have 10 versions of the array files at any given moment and the distribution / deallocation of this array for each directory in the structure.

Using the same recursive method with _findfirst, etc., all that is needed is for the handles in the position in the directory structure to be stored at each recursion level.

+1
source

All Articles