The fastest way to copy files from one directory to another

I need to copy files from one directory to another, depending on the presence of the file name in the SQL database table.

For this, I use the following code:

using(SqlConnection connection = new SqlConnection("datasource or route")) { connection.Open(); using(SqlCommand cmd = new SqlCommand("SELECT idPic, namePicFile FROM DocPicFiles", connection)) using (SqlDataReader reader = cmd.ExecuteReader()) { if (reader != null) { while (reader.Read()) { //picList IS AN ARRAY THAT Contains All the files names in a directory if (picList.Any(s => s.Contains(reader["namePicFile"].ToString()))) { File.Copy("theFile in the Directory or array picList", "the destiny directory"+ ".jpg", false) } } } } } 

Is there a way that this can be done in less time? It takes 1 hour for this, for 20.876 entries.

+1
source share
4 answers

Since your I / O subsystem is almost certainly a botnet here, using a parallel task library is probably about as good:

 static void Main(string[] args) { DirectoryInfo source = new DirectoryInfo( args[0] ) ; DirectoryInfo destination = new DirectoryInfo( args[1] ) ; HashSet<string> filesToBeCopied = new HashSet<string>( ReadFileNamesFromDatabase() , StringComparer.OrdinalIgnoreCase ) ; // you'll probably have to play with MaxDegreeOfParallellism so as to avoid swamping the i/o system ParallelOptions options= new ParallelOptions { MaxDegreeOfParallelism = 4 } ; Parallel.ForEach( filesToBeCopied.SelectMany( fn => source.EnumerateFiles( fn ) ) , options , fi => { string destinationPath = Path.Combine( destination.FullName , Path.ChangeExtension( fi.Name , ".jpg") ) ; fi.CopyTo( destinationPath , false ) ; }) ; } public static IEnumerable<string> ReadFileNamesFromDatabase() { using ( SqlConnection connection = new SqlConnection( "connection-string" ) ) using ( SqlCommand cmd = connection.CreateCommand() ) { cmd.CommandType = CommandType.Text ; cmd.CommandText = @" select idPic , namePicFile from DocPicFiles " ; connection.Open() ; using ( SqlDataReader reader = cmd.ExecuteReader() ) { while ( reader.Read() ) { yield return reader.GetString(1) ; } } connection.Close() ; } } 
+3
source

File.Copy is as fast as it gets. You must remember that you are dependent on the file transfer rate dictated by your equipment, and in 20,000 files, latency for accessing data is also included in the game. If you do this on your hard drive, you can see a significant improvement after switching to SSD or some other fast medium.

In this case, most likely, the hardware is your bottleneck.

EDIT: I believe that maintaining a database connection for as long as bad practice. I suggest you get all the necessary data in some caches in memory (array, list, whatever), and then repeat this when copying files. A db connection is a valuable resource and applications that need to handle high concurrency (but not only), releasing the connection quickly is a must.

+8
source

Let me make an assumption - Mmmmm ... No. It is impossible to do it faster.

Why am I so sure? Because copying files requires talking to the disk, and this is an awfully slow operation. Moreover, if you try to use multithreading, the results will be slower, and not faster, because the "mechanical" operation of moving the head across the disk is no longer sequential, which was previously possible by chance.

See the answers to this question I asked earlier .

So, try switching to SSDs if you are not already using them, otherwise you will already get the best.

Below is something for us to imagine in the long run that slow means writing to disk compared to caches. If access to the cache takes 10 minutes, this means that it takes 2 years to read from disk. All hits are shown in the image below. Obviously, when your code is executed, the bottleneck will be writing to disk. The best you can do to keep your discs consistent.

enter image description here

+4
source

I solved this problem by creating one compressed file (.zip) using this option to just save the file (without compression). Creating a single (.zip) file, moving this single file, and then expanding the location turned out to be twice as fast when working with thousands of files.

0
source

All Articles