Delphi compare text file contents

We need to compare the contents of two (or more) text files to determine if we need to back up. If they are different, we create a new backup.

I am currently using a CRC value for each file to check for differences, but I was wondering if there is a more efficient or elegant way to detect differences between files.

//Use madZIP to calculate the CRC fior this file GetUncompressedFileInfo(Filename_1, Size_1, NewCRC); //Use madZIP to calculate the CRC fior this file GetUncompressedFileInfo(Filename_2, Size_2, OldCRC); //if ThisFileHash = ExistingFileHash then if (OldCRC <> NewCRC) then CreateABackup; 

Regards, Peter.

+4
source share
4 answers

CRC is probably more accurate and quite effective. However, do you need to check the contents?

I assume that you check the CRC to see if a modification has been made and reprogram the updated file. In this case, FileAge () would be just fine.

+2
source

CRC is not a safe way to detect file changes - cryptographic hashes (like MD5 or SHA1) are much better.

Another approach (for example, used by build systems) is to compare file dates. If the file is newer than the backup, a new backup is required.

+7
source

You should also consider using incremental backups.

I published some optimized version control features for our SynProject Open Source tool. The TVersions class in the ProjectVersioning block allows you to store diff binaries in a zip container.

Our patented, but faster than zip SynLZ algorithm is used to store additional differences. This works well in practice.

See the TVersions.FillStrings method for retrieving a list of files to be updated.

Remember that you may find a one-hour difference, depending on the current summer time. Here's how we allow date comparisons:

 function SameFileDateWindows(FileDate1,FileDate2: integer): boolean; // we allow an exact one Hour round (NTFS bug on summer time zone change) begin dec(FileDate1,FileDate2); result := (FileDate1=0) or (FileDate1=1 shl 11) or (FileDate1=-(1 shl 11)); end; 

We do not read the contents of the file here. For backup purposes, it is enough to rely on the file date to mark the file for comparison. Then a differential diff is performed for both versions of the file. If the contents of the file match, it will only save the date difference.

IMHO, you should not use the patented madzip container, but a standard one, for example .zip. There are several options, including our version used in SynProject or our ORM. It is faster than MadZip and decompression in optimized asm. See SynZip device for low-level compression and a simple .zip reader and writer, as well as more advanced classes in SynZipFiles (used in SynProject). For a clean version of Delphi, like madzip, check out the PasZip block, which is faster than madzip (but PasZip will not compile with Unicode Delphi, whereas SynZip does).

+1
source

Actually, the best practice of providing a file identifier is to store content hashes (for example: CRC-32 or any other hash function) and file size . This will increase reliability in magnitude. RE: for storage - there is no need to calculate a hash for content that, as you know, has not changed more than once.

0
source

All Articles