How to compare two folders for non-identical files based on name?

I have two folders A and B .. Inside there are several files and inside B there are several files. I have to check files in with files in B for non-identical files ... I tried how it gives the full search result ...

var filesnotinboth = from f1 in dir1.GetFiles("*", SearchOption.AllDirectories) from f2 in dir2.GetFiles("*",SearchOption.AllDirectories) where f1.Name != f2.Name select f1.Name; 

Any suggestion?

+4
source share
3 answers

Well, on the one hand, this approach is very inefficient - it will call dir2.GetFiles every time you start with a new f1 . Then it will give a match for each f2 that does not match the current f1 . Therefore, even if it matches the later f1 , it will still be output. Suppose dir1 contains A, B and C, and dir2 contains C and D. You will end as follows:

 f1 f2 Result of where? AC True AD True BC True BD True CC False CD True 

Thus, the result will be A, A, B, B, C - you still have C (which you did not want) - this is not as common as A and B.

You want to use dialing operations, for example:

 var dir1Files = dir1.GetFiles("*", SearchOption.AllDirectories) .Select(x => x.Name); var dir2Files = dir2.GetFiles("*", SearchOption.AllDirectories) .Select(x => x.Name); var onlyIn1 = dir1Files.Except(dir2Files); 

Now this should work more efficiently ...

EDIT: I assumed that you want the files in A, but not in B, based on a possibly earlier version of the question. (I'm not sure if it was edited in the first five minutes. Obviously, the current code will not return anything in B, but not A.)

If you want a symmetric difference, use a HashSet<T>.SymmetricExceptWith :

 var inExactlyOneDirectory = new HashSet<string>(dir1Files); inExactlyOneDirectory.SymmetricExceptWith(dir2Files); 

(Note that I don’t like the fact that SymmetricExceptWith is a void method that mutates an existing set instead of returning a new set or just a sequence. Among other things, this means that the variable name only matches after the second, not first.)

EDIT: if you need uniqueness in name and size, you really need an anonymous type representing both. Unfortunately, then it is difficult to create a HashSet<T> . Therefore, you will need an extension method as follows:

 public static HashSet<T> ToHashSet<T>(this IEnumerable<T> set) { return new HashSet<T>(set); } 

Then:

 var dir1Files = dir1.GetFiles("*", SearchOption.AllDirectories) .Select(x => new { x.Name, x.Length }); var dir2Files = dir2.GetFiles("*", SearchOption.AllDirectories) .Select(x => new { x.Name, x.Length }); var difference = dir1Files.ToHashSet(); difference.SymmetricExceptWith(dir2Files); 
+9
source

John Skeet's answer should help you understand why your current solution will not work and is fundamentally inefficient.

With regard to solving the problem, one option would be to use the HashSet.SymmetricExceptWith method, which "changes the current HashSet (Of T) object contains only those elements that are present either in this object or in the specified collection, but not in both."

 // Thanks to Jon Skeet for template var dir1Files = dir1.GetFiles("*", SearchOption.AllDirectories) .Select(x => x.Name); var dir2Files = dir2.GetFiles("*", SearchOption.AllDirectories) .Select(x => x.Name); var filesNotInBoth = new HashSet<string>(dir1Files); filesNotInBoth.SymmetricExceptWith(dir2Files); 
+2
source
 var files2 = dir2.GetFiles("*", SearchOption.AllDirectories); var filesnotinboth = dir1.GetFiles("*", SearchOption.AllDirectories) .Where(f1 => !files2.Any(f2 => f2.Name == f1.Name)); 
-1
source

All Articles