Check the uniqueness of an array of objects

I am reading data from files (e.g. CSV and Excel) and must make sure that each line in the file is unique.

Each line will be represented as object[]. This cannot be changed due to the current architecture. Each object in the array can be of different kinds ( decimal, string, intetc.).

The file may look like this:

foo    1      5 // Not unique
bar    1      5
bar    2      5
foo    1      5 // Not unique

The file will probably have 200,000 rows and 4-100 columns.

The code I have now looks like this:

IList<object[]> rows = new List<object[]>();

using (var reader = _deliveryObjectReaderFactory.CreateReader(deliveryObject))
{
    // Read the row.
    while (reader.Read())
    {
        // Get the values from the file.
        var values = reader.GetValues();

        // Check uniqueness for row
        foreach (var row in rows)
        {
            bool rowsAreDifferent = false;

            // Check uniqueness for column.
            for (int i = 0; i < row.Length; i++)
            {
                var earlierValue = row[i];
                var newValue = values[i];
                if (earlierValue.ToString() != newValue.ToString())
                {
                    rowsAreDifferent = true;
                    break;
                }
            }
            if(!rowsAreDifferent)
                throw new Exception("Rows are not unique");
        }
        rows.Add(values);
    }
}

So my question is, can this be done more efficiently? For example, using hashes and verify that the hash is unique instead?

+4
source share
1 answer

HashSet<object[]> IEqualityComparer<object[]> :

HashSet<object[]> rows = new HashSet<object[]>(new MyComparer());

while (reader.Read())
{
    // Get the values from the file.
    var values = reader.GetValues();    
    if (!rows.Add(values))
        throw new Exception("Rows are not unique");
}

MyComparer :

public class MyComparer : IEqualityComparer<object[]>
{
    public bool Equals(object[] x, object[] y)
    {
        if (ReferenceEquals(x, y)) return true;
        if (ReferenceEquals(x, null) || ReferenceEquals(y, null) || x.Length != y.Length) return false;
        return x.Zip(y, (a, b) => a == b).All(c => c);
    }
    public int GetHashCode(object[] obj)
    {
        unchecked
        {
            // this returns 0 if obj is null
            // otherwise it combines the hashes of all elements
            // like hash = (hash * 397) ^ nextHash
            // if an array element is null its hash is assumed as 0
            // (this is the ReSharper suggestion for GetHashCode implementations)
            return obj?.Aggregate(0, (hash, o) => (hash * 397) ^ (o?.GetHashCode() ?? 0)) ?? 0;
        }
    }
}

, a==b .

+4

All Articles