Comparing file names just not with content - is this my solution right?

we transfer a large number of documents / images and before the actual storage of these documents on the sql server. I want to compare 2 file lists.

  • My list is filePaths (there will be a txtFile with a list of file paths In it.Converted to hashset)

  • Their list of file fields (will be read on the fly and create a hash)

    public static HashSet<string> ToHashSet(this string rootDirectory)
    {
        const string searchPattern = "*.*";
        string[] files = Directory.GetFiles(rootDirectory, searchPattern, SearchOption.AllDirectories);
        return new HashSet<string>(files);
    }
    

So, I am comparing MyHashSet with theirHashSet.

Just getting a little paranoid here and just want to double check if I just do what I think he does.

Except = "Given 2 hash sets, compare all file_files, and if those of them in MyList are not found in MyList, you will get the result"

I wrote a small test that proves that, apart from finding diff.

?

Dummy ProofOfConcept

 class Program
{
    static void Main(string[] args)
    {
        const string rootDirectory = @"C:\Tests";
        HashSet<string> myHashSet= CreateDummyHashSet(rootDirectory,10);
        HashSet<string> theirHashSet= CreateDummyHashSet(rootDirectory, 12);

        IEnumerable<string> result = theirHashSet.Except(myHashSet);

        foreach (var file in result)
        {
            Console.WriteLine(file);
        }
        Console.Read();
    }

    public static HashSet<string> CreateDummyHashSet(string rootDirectory, int numberOfFiles)
    {
        var dummyHashSet = new HashSet<string>();
        const string extension = ".txt";
        const string fileName = "File";
        for (int i = 0; i < numberOfFiles; i++)
        {
            string fullfileName = string.Format("{0}{1}{2}", fileName, i, extension);
            string path = Path.Combine(rootDirectory, fullfileName);
            dummyHashSet.Add(path);
        }
        return dummyHashSet;
    }
}
+4
3

?

, . Hashset .

sbrauen

var result = theirHashSet.Where(x => !myHashSet.Contains(x));

n m, n m - theirHashSet myHashSet . Hashset . , Except ExceptWith, Except IEnumerable, ExceptWith HashSet<>.

EDIT:

, Except IEnumerable, ExceptWith theirHashSet. , ExceptWith , HashTable, Except - .

Set<TSource> set = new Set<TSource>(comparer);
foreach (TSource tSource in second)
{
    set.Add(tSource);
}
foreach (TSource tSource1 in first)
{
    if (!set.Add(tSource1))
    {
        continue;
    }

    yield return tSource1;
}

ExceptWith

foreach (T t in other)
{
    this.Remove(t);
}

.

+1

- :

var result = theirHashSet.Where(x => !myHashSet.Contains(x));
0

The documentation is pretty straightforward: it returns values ​​from the first HashSet that are not in the second.

Is this the right and best way to compare large files?

However, in your case, it only compares the file path lists, not the contents of the files. That is, after checking, you know that you have (or not) files with the same name, but not if the files are the same.

0
source

All Articles