Mark non-unique rows in a DataTable

I have a DataTable that I want to check if the values ​​in the three columns are unique. If not, the last column should be filled with the row number of the first occurrence of the combination of values.

For example, this table:

ID Name LastName Age Flag ------------------------------------- 1 Bart Simpson 10 - 2 Lisa Simpson 8 - 3 Bart Simpson 10 - 4 Ned Flanders 40 - 5 Bart Simpson 10 - 

Should lead to this result:

 Line Name LastName Age Flag ------------------------------------- 1 Bart Simpson 10 - 2 Lisa Simpson 8 - 3 Bart Simpson 10 1 4 Ned Flanders 40 - 5 Bart Simpson 10 1 

I solved this by looping through the DataTable with two nested for loops and comparing the values. Although this is great for a small amount of data, it gets pretty slow when a DataTable contains a lot of rows.

My question is: what is the best / fastest solution to this problem, given that the amount of data can vary between 100,000 and 20,000 rows?
Is there a way to do this using LINQ? (I'm not too familiar with this, but I want to learn!)

+4
source share
2 answers

Well, I think I got the answer myself. Based on the assumption in response to James Wiseman, I tried something with LINQ.

 Dim myErrnrFnct = Function( current, first) If(first <> current, first, 0) Dim myQuery = From row As DataRow In myDt.AsEnumerable _ Select New With { _ .LINE = row.Item("LINE"), _ .NAME = row.Item("NAME"), _ .LASTNAME = row.Item("LASTNAME"), _ .AGE = row.Item("AGE"), _ .FLAG = myErrnrFnct(row.Item("LINE"), myDt.AsEnumerable.First(Function(rowToCheck) _ rowToCheck.Item("NAME") = row.Item("NAME") AndAlso _ rowToCheck.Item("LASTNAME") = row.Item("LASTNAME") AndAlso _ rowToCheck.Item("AGE") = row.Item("AGE")).Item("LINE")) _ } 

With this query, I get exactly the result described in the Question. The myErrnrFnct function myErrnrFnct needed because I want the Flag column to be 0 if there is no other row with the same values.

To get the DataTable from myQuery , I had to add some extensions described here:
A practical guide. An implementation of CopyToDataTable where the generic type T is not a DataRow
And then this line will do:

 Dim myNewDt As DataTable = myQuery.CopyToDataTable() 

Everything seems to be working fine. Any suggestions to make this better?

0
source

I cannot comment on how to do this in C # / VB with a data table, but if you can move all this into SQL, your query will look like this:

 declare @t table (ID int, Name varchar(10), LastName varchar(10), Age int) insert into @t values (1, 'Bart' , 'Simpson', 10 ) insert into @t values (2, 'Lisa', 'Simpson' , 8 ) insert into @t values (3, 'Bart', 'Simpson' , 10 ) insert into @t values (4, 'Ned', 'Flanders' , 40 ) insert into @t values (5 , 'Bart', 'Simpson' , 10 ) select t.*, (select min(ID) as ID from @t t2 where t2.Name = t.Name and t2.LastName = t.LastName and t2.id < t.id) from @tt 

Here I have defined a table for demonstration purposes. I suppose you could translate this to LINQ.

+2
source

All Articles