C # directory containing a massive directory

Here is the scenario:

I have a directory with 2 million files. The code that I have below writes all the files in about 90 minutes. Does anyone have a way to speed it up or make this code more efficient? I would also like to just write the file names in a list.

string lines = (listBox1.Items.ToString()); string sourcefolder1 = textBox1.Text; string destinationfolder = (@"C:\anfiles"); using (StreamWriter output = new StreamWriter(destinationfolder + "\\" + "MasterANN.txt")) { string[] files = Directory.GetFiles(textBox1.Text, "*.txt"); foreach (string file in files) { FileInfo file_info = new FileInfo(file); output.WriteLine(file_info.Name); } } 

Slowdown is that it writes out 1 line at a time.

It takes about 13-15 minutes to get all the files that need to be written.

The next 75 minutes create the file.

+4
c # file-io
source share
5 answers

This may help if you do not create an instance of FileInfo for each file, use Path.GetFileName instead:

 string lines = (listBox1.Items.ToString()); string sourcefolder1 = textBox1.Text; string destinationfolder = (@"C:\anfiles"); using (StreamWriter output = new StreamWriter(Path.Combine(destinationfolder, "MasterANN.txt")) { string[] files = Directory.GetFiles(textBox1.Text, "*.txt"); foreach (string file in files) { output.WriteLine(Path.GetFileName(file)); } } 
+8
source share

You read 2 million file descriptors in memory. Depending on how much memory you have, you may well switch places. Try breaking it into small pieces by filtering the file name.

+6
source share

The first thing I need to know is where is the slowdown? it takes 89 minutes for Directory.GetFiles () to execute or is the delay propagated by calls to FileInfo file_info = new FileInfo(file); ?

If the delay comes from the latter, you can probably speed up the process by getting the file name from the path instead of creating an instance of FileInfo to get the file name.

 System.IO.Path.GetFileName(file); 
+5
source share

In my experience, this is Directory.GetFiles , which slows you down (except for console output). To overcome this, P / Invoke in FindFirstFile / FindNextFile to avoid memory consumption and total latency.

+3
source share

Using Directory.EnumerateFiles , you do not need to load all the file names into memory first. Check this out: C # directory.getfiles memory help

In your case, the code could be:

 using (StreamWriter output = new StreamWriter(destinationfolder + "\\" + "MasterANN.txt")) { foreach (var file in Directory.EnumerateFiles(sourcefolder, "*.txt")) { output.WriteLine(Path.GetFileName(file)); } } 

From this document , he said that:

The EnumerateFiles and GetFiles methods differ as follows: when you use EnumerateFiles, you can start enumerating a collection of names before returning the entire collection; when you use GetFiles, you must wait for the entire array of names to return before you can access the array. Therefore, when you work with many files and directories, EnumerateFiles may be more efficient.

So, if you have enough memory, Directory.GetFiles is fine. But Directory.EnumerateFiles is much better if the folder contains millions of files.

0
source share

All Articles