Get duplicates for two columns with LINQ

LINQ is driving me crazy. Why doesn't the following query return duplicates, while it only works with one identifier? Where is my mistake?

' generate some test-data ' Dim source As New DataTable source.Columns.Add(New DataColumn("RowNumber", GetType(Int32))) source.Columns.Add(New DataColumn("Value1", GetType(Int32))) source.Columns.Add(New DataColumn("Value2", GetType(Int32))) source.Columns.Add(New DataColumn("Text", GetType(String))) Dim rnd As New Random() For i As Int32 = 1 To 100 Dim newRow = source.NewRow Dim value = rnd.Next(1, 20) newRow("RowNumber") = i newRow("Value1") = value newRow("Value2") = (value + 1) newRow("Text") = String.Format("RowNumber{0}-Text", i) source.Rows.Add(newRow) Next ' following query does not work, it always has Count=0 ' ' although it works with only one identifier ' Dim dupIdentifiers = From row In source Group row By grp = New With {.Val1 = row("Value1"), .Val2 = row("Value2")} Into Group Where Group.Count > 1 Select idGroup = New With {grp.Val1, grp.Val2, Group.Count} 

Change Below is the complete solution, thanks to @Jon Skeet answer :)

 Dim dupKeys = From row In source Group row By grp = New With {Key .Val1 = CInt(row("Value1")), Key .Val2 = CInt(row("Value2"))} Into Group Where Group.Count > 1 Select RowNumber = CInt(Group.FirstOrDefault.Item("RowNumber")) Dim dupRows = From row In source Join dupKey In dupKeys On row("RowNumber") Equals dupKey Select row If dupRows.Any Then ' create a new DataTable from the first duplicate rows ' Dim dest = dupRows.CopyToDataTable End If 

The main problem with grouping was that I have to make them key properties. The next problem in my code above was to get duplicate rows from the source table. Since almost every row has a duplicate (according to two fields), the result of a DataTable contains 99 out of 100 rows, not just 19 duplicate values. I needed to select only the first repeating row and join it with the source table on the PC.

 Select RowNumber = CInt(Group.FirstOrDefault.Item("RowNumber")) 

Although this works in my case, maybe someone can explain to me how to select only duplicates from the source table if I only had composite keys.


Change I myself answered the last part of the question, so here is all I need:

 Dim dups = From row In source Group By grp = New With {Key .Value1 = CInt(row("Value1")), Key .Value2 = CInt(row("Value2"))} Into Group Where Group.Count > 1 Let Text = Group.First.Item("Text") Select Group.First If dups.Any Then Dim dest = dups.CopyToDataTable End If 

I needed a Let-Keyword to save other columns (columns) in one context and return only the first row of grouped duplicates. In this way, I can use CopyToDataTable to create a DataTable from duplicate rows.

Just a few lines of code (I can save the second query to find rows in the source table) to find duplicates in several columns and create a DataTable from them.

+4
source share
1 answer

The problem is how anonymous types work in VB - they change by default; only Key properties are included for hashing and equality. Try the following:

 Group row By grp = New With {Key .Val1 = row("Value1"), Key .Val2 = row("Value2")} 

(In C #, this will not be a problem - anonymous types in C # are always immutable in all properties.)

+6
source

All Articles