I am reading data from files (e.g. CSV and Excel) and must make sure that each line in the file is unique.
Each line will be represented as object[]. This cannot be changed due to the current architecture. Each object in the array can be of different kinds ( decimal, string, intetc.).
The file may look like this:
foo 1 5 // Not unique
bar 1 5
bar 2 5
foo 1 5 // Not unique
The file will probably have 200,000 rows and 4-100 columns.
The code I have now looks like this:
IList<object[]> rows = new List<object[]>();
using (var reader = _deliveryObjectReaderFactory.CreateReader(deliveryObject))
{
while (reader.Read())
{
var values = reader.GetValues();
foreach (var row in rows)
{
bool rowsAreDifferent = false;
for (int i = 0; i < row.Length; i++)
{
var earlierValue = row[i];
var newValue = values[i];
if (earlierValue.ToString() != newValue.ToString())
{
rowsAreDifferent = true;
break;
}
}
if(!rowsAreDifferent)
throw new Exception("Rows are not unique");
}
rows.Add(values);
}
}
So my question is, can this be done more efficiently? For example, using hashes and verify that the hash is unique instead?
source
share