How to programmatically find the difference between two directories

First of all; I'm not necessarily looking for Delphi code, drink it in any way.

I searched (especially here) and found out a little about those who are looking for ways to compare with directories (inclusive), although they used byte byte methods. Secondly, I'm not looking for diffftool, I'm just "looking" for a way to find files that do not match, and, just as importantly, files that are in one directory but not in another and vice versa.

More specifically: I have one directory (backup folder) that I constantly update with FindFirstChangeNotification. Although for the first time I need to copy all the files, and I also need to check the backup directory for the original when the applications are launched (in case something happened when the application was not running, or FindFirstChangeNotification failed to intercept the file). To solve this problem, I am going to create a CRC list for backup files, and then run through the source directory calculating CRC for each file, and finally compare two CRCs. Then somehow find the files that are in one directory and not the others (again, vice versa).

Here's the question : Is this the fastest way? If so, how would (roughly) complete the task?

+4
source share
3 answers

For each file you do not need CRC, you can simply compare the last modified date for each file for most normal purposes. It is faster. If you need extra security, you can also compare lengths. You get both of these metrics for free with the find functions.

And in the change notification, you should probably add files to the queue and use a timer object to copy new files to the queue every ~ 30 seconds or something like that, so you do not struggle with a system with frequent updates / checks.

For added speed, use Win32 functions wherever possible, avoid any functions of searching / copying / receiving Delphi files. I am not familiar with the Delphi platform, but, for example, C # stuff is WAY WAY WAY slower than Win32 functions.

+5
source

Regardless of the fact that you are "not looking for diffftool", are you against using Cygwin with the diff command for the shell? If you are open to this, it is quite simple, especially using diff with the -r option is "recursive".

The following are the differences between the Rails settings on my machine and not only information on the differences between the files is uploaded, but also, in particular, grepping for "Only", finds files in one directory, but not others:

$ diff -r pgnindex pgnonrails | egrep '^Only|diff' Only in pgnindex/app/controllers: openings_controller.rb Only in pgnindex/app/helpers: openings_helper.rb Only in pgnindex/app/views: openings diff -r pgnindex/config/environment.rb pgnonrails/config/environment.rb diff -r pgnindex/config/initializers/session_store.rb pgnonrails/config/initializers/session_store.rb diff -r pgnindex/log/development.log pgnonrails/log/development.log Only in pgnindex/test/functional: openings_controller_test.rb Only in pgnindex/test/unit: helpers 
0
source

The fastest way to compare one directory on a local computer with a directory on another computer over thousands of kilometers is exactly the same as you suggest:

  • generate CRC / checksum for each file
  • send the name, path and CRC / checksum for each file over the Internet to another computer.
  • compare

Perhaps the easiest way to do this is to use rsync with the option "--dryrun" or "--list-only". (Either use one of the many applications using the rsync algorithm, or compile the rsync algorithm into your application).

 cd some_backup_directory rsync --dryrun myname@remote _host:latest_version_directory . 

For the default speed, rsync assumes, as Blindi suggested, that two files with the same name and the same path and the same length and the same modification time are the same. For added security, you can give rsync the "--checkum" option to ignore the modification time and time and make it compare (checksum) the actual contents of the file.

0
source

All Articles