What is the easiest way to find out if two files are programmatic?

What is the easiest way to find out if two text files are programmatic? Given the two files, I just need to know if they are different or not. This is a quick tool that helps with a particularly unpleasant merge (switching languages ​​from VB to C # in one branch (yay!) And making many changes to another), it will not go into production.

Possible solutions:

  • Hash both files and compare hash
  • Insert files and just compare the line
  • Calling an external comparison tool (unfortunately, Winmerge does not have a CLI for this)

If possible, ignoring the whitespace would be awesome, but I don't care about that. The main thing is that it is quick and easy.

I am using .Net 3.5sp1. Thanks for any ideas or pointers.

+7
source share
8 answers

There is an article in the Microsoft Knowledge Base, hope this helps. They compare bytes to see if the two files are different - How to create a file comparison function in Visual C #

+11
source share

The fastest way to do this is to compare the byte-byte files uploaded to the stream. Hashing both files will take too much time for large files, comparing strings too, external tools too.

Comparing byte with byte will be best for you, as it will only reach EOF files when they are both identical.

If you do hash comparisons, string comparisons, or external tools, you have to go through all the files all the time that you are comparing, byte-byte comparisons will only do this if they are identical.

+10
source share

Check byte by byte, here is the code:

public static bool AreFilesIdentical(string path1, string path2) { using (FileStream file1 = new FileStream(path1)) { using (FileStream file2 = new FileStream(path2)) { if (file1.Length == file2.Length) { while (file1.Position < file1.Length) { if (file1.ReadByte() != file2.ReadByte()) { return false; } } return true; } return false; } } } 
+4
source share

Does the MD5 hash algorithm use to compare results? Here is an example .

+3
source share

It also depends on what you are trying to solve. You are trying to answer the question: in this directory of N files, find all the exact duplicates? Or are these two files the same?

If you just simply compare two files, then using a byte byte check is more efficient.

But if you are trying to find all duplicate pairs in N files, it is better to use the MD5 hash, because you can create and save the MD5 hash value once and compare this much smaller value with each pair of files. In other words, you will iterate over each file stream of bytes for every other file in the directory.

+1
source share

I implemented a very specialized version of diff a year ago (I had a file with more than 6 GB and I had to compare it). So I know the inner workings of diff (lots of copies and pastes, of course). Some thoughts:

  • If you just want to know if they differ, compare them byte by byte. Optimize by checking to see if their sizes (lengths) are different, and then reading the files one byte at a time and checking to see if they are different. You do not need to worry about buffering, as your file API should do this for you (.Net).
  • If there are several rules that you would like to apply to the comparison:
    • If you ignore the space or any other character when you read the byte, it checks to see if it should be ignored. If so, read the following, but only in this file.
    • If there are rules that will be applied in turn, then read the file line by line. Then hash the string, ignoring everything you want to ignore.
    • Remember that a line can be defined as a variable-length record with a new line as a terminator (delimiter). That way, you can determine that the string will be what you want and read it exactly, hash and compare.

I can contribute to the code if you want. Diff'ing files is harder because you will also output what is different.

+1
source share

From the question - The simplest and most text file

 StreamReader sr1 = new StreamReader(filePath1); StreamReader sr2 = new StreamReader(filePath2); if (sr1.ReadToEnd() == sr2.ReadToEnd() ) { do stuff } 

It’s not fast or beautiful, but easy

0
source share
 if ( $file1 != $file2 ) return true; 

Of course this varies between VB and C #

0
source share

All Articles