How to find duplicate values โ€‹โ€‹in SQL Server

I am using SQL Server 2008. I have a table

Customers customer_number int field1 varchar field2 varchar field3 varchar field4 varchar 

... and many more columns that don't matter to my queries.

Customer_number column - pk. I am trying to find duplicate values โ€‹โ€‹and some differences between them.

Please help me find all rows with the same

1) field1, field2, field3, field4

2) only 3 columns are equal and one of them is not (except for the rows from list 1)

3) only 2 columns are equal, and two of them are not (except for rows from list 1 and list 2)

In the end, I will have 3 tables with these results and additional groupId that will be the same for a group of similar ones (for example, for three columns it is equal, rows having the same identical columns will be a separate group)

Thanks.

+7
sql-server duplicates sql-server-2008
source share
5 answers

The simplest would probably be to write a stored procedure to iterate over each group of customers with duplicates and insert the corresponding numbers on the group number, respectively.

However, I thought about this, and you can probably do this with a subquery. I hope I did not make it more complicated than it should, but this should lead to what you are looking for for the first table of duplicates (all four fields). Please note that this has not been verified, so a little tweaking may be required.

Basically, he gets each group of fields where there are duplicates, the group number for each, then he gets all the customers with these fields and assigns the same group number.

 INSERT INTO FourFieldsDuplicates(group_no, customer_no) SELECT Groups.group_no, custs.customer_no FROM (SELECT ROW_NUMBER() OVER(ORDER BY c.field1) AS group_no, c.field1, c.field2, c.field3, c.field4 FROM Customers c GROUP BY c.field1, c.field2, c.field3, c.field4 HAVING COUNT(*) > 1) Groups INNER JOIN Customers custs ON custs.field1 = Groups.field1 AND custs.field2 = Groups.field2 AND custs.field3 = Groups.field3 AND custs.field4 = Groups.field4 

The rest is a little more complicated, however you need to expand the possibilities. Then the three-group groups will be:

 INSERT INTO ThreeFieldsDuplicates(group_no, customer_no) SELECT Groups.group_no, custs.customer_no FROM (SELECT ROW_NUMBER() OVER(ORDER BY GroupsInner.field1) AS group_no, GroupsInner.field1, GroupsInner.field2, GroupsInner.field3, GroupsInner.field4 FROM (SELECT c.field1, c.field2, c.field3, NULL AS field4 FROM Customers c WHERE NOT EXISTS(SELECT d.customer_no FROM FourFieldsDuplicates d WHERE d.customer_no = c.customer_no) GROUP BY c.field1, c.field2, c.field3 UNION ALL SELECT c.field1, c.field2, NULL AS field3, c.field4 FROM Customers c WHERE NOT EXISTS(SELECT d.customer_no FROM FourFieldsDuplicates d WHERE d.customer_no = c.customer_no) GROUP BY c.field1, c.field2, c.field4 UNION ALL SELECT c.field1, NULL AS field2, c.field3, c.field4 FROM Customers c WHERE NOT EXISTS(SELECT d.customer_no FROM FourFieldsDuplicates d WHERE d.customer_no = c.customer_no) GROUP BY c.field1, c.field3, c.field4 UNION ALL SELECT NULL AS field1, c.field2, c.field3, c.field4 FROM Customers c WHERE NOT EXISTS(SELECT d.customer_no FROM FourFieldsDuplicates d WHERE d.customer_no = c.customer_no) GROUP BY c.field2, c.field3, c.field4) GroupsInner GROUP BY GroupsInner.field1, GroupsInner.field2, GroupsInner.field3, GroupsInner.field4 HAVING COUNT(*) > 1) Groups INNER JOIN Customers custs ON (Groups.field1 IS NULL OR custs.field1 = Groups.field1) AND (Groups.field2 IS NULL OR custs.field2 = Groups.field2) AND (Groups.field3 IS NULL OR custs.field3 = Groups.field3) AND (Groups.field4 IS NULL OR custs.field4 = Groups.field4) 

I hope this gives the correct results, and I will leave the latter as an exercise.: - D

+4
source share

Here's a handy query for finding duplicates in a table. Suppose you want to find all the email addresses in a table that exist more than once:

 SELECT email, COUNT(email) AS NumOccurrences FROM users GROUP BY email HAVING ( COUNT(email) > 1 ) 

You can also use this technique to search for strings that occur exactly once:

 SELECT email FROM users GROUP BY email HAVING ( COUNT(email) = 1 ) 
+52
source share

I am not sure if an equality check is needed for different fields (e.g. field1 = field2).
Otherwise, this may be sufficient.

Edit

Feel free to customize the test data to provide us with inputs that give the wrong result according to your specifications.

Test Data

 DECLARE @Customers TABLE ( customer_number INTEGER IDENTITY(1, 1) , field1 INTEGER , field2 INTEGER , field3 INTEGER , field4 INTEGER) INSERT INTO @Customers SELECT 1, 1, 1, 1 UNION ALL SELECT 1, 1, 1, 1 UNION ALL SELECT 1, 1, 1, NULL UNION ALL SELECT 1, 1, 1, 2 UNION ALL SELECT 1, 1, 1, 3 UNION ALL SELECT 2, 1, 1, 1 

Everyone is equal

 SELECT ROW_NUMBER() OVER (ORDER BY c1.customer_number) , c1.field1 , c1.field2 , c1.field3 , c1.field4 FROM @Customers c1 INNER JOIN @Customers c2 ON c2.customer_number > c1.customer_number AND ISNULL(c2.field1, 0) = ISNULL(c1.field1, 0) AND ISNULL(c2.field2, 0) = ISNULL(c1.field2, 0) AND ISNULL(c2.field3, 0) = ISNULL(c1.field3, 0) AND ISNULL(c2.field4, 0) = ISNULL(c1.field4, 0) 

One field is different

 SELECT ROW_NUMBER() OVER (ORDER BY field1, field2, field3, field4) , field1 , field2 , field3 , field4 FROM ( SELECT DISTINCT c1.field1 , c1.field2 , c1.field3 , field4 = NULL FROM @Customers c1 INNER JOIN @Customers c2 ON c2.customer_number > c1.customer_number AND c2.field1 = c1.field1 AND c2.field2 = c1.field2 AND c2.field3 = c1.field3 AND ISNULL(c2.field4, 0) <> ISNULL(c1.field4, 0) UNION ALL SELECT DISTINCT c1.field1 , c1.field2 , NULL , c1.field4 FROM @Customers c1 INNER JOIN @Customers c2 ON c2.customer_number > c1.customer_number AND c2.field1 = c1.field1 AND c2.field2 = c1.field2 AND ISNULL(c2.field3, 0) <> ISNULL(c1.field3, 0) AND c2.field4 = c1.field4 UNION ALL SELECT DISTINCT c1.field1 , NULL , c1.field3 , c1.field4 FROM @Customers c1 INNER JOIN @Customers c2 ON c2.customer_number > c1.customer_number AND c2.field1 = c1.field1 AND ISNULL(c2.field2, 0) <> ISNULL(c1.field2, 0) AND c2.field3 = c1.field3 AND c2.field4 = c1.field4 UNION ALL SELECT DISTINCT NULL , c1.field2 , c1.field3 , c1.field4 FROM @Customers c1 INNER JOIN @Customers c2 ON c2.customer_number > c1.customer_number AND ISNULL(c2.field1, 0) <> ISNULL(c1.field1, 0) AND c2.field2 = c1.field2 AND c2.field3 = c1.field3 AND c2.field4 = c1.field4 ) c 
+2
source share

You can write just something like this to read duplicate records, I think this works:

 use *DATABASE_NAME* go SELECT *YOUR_FIELD*, COUNT(*) AS dupes FROM *YOUR_TABLE_NAME* GROUP BY *YOUR_FIELD* HAVING (COUNT(*) > 1) 

Enjoy

0
source share

There is a clean way to do this with CUBE() , which will be aggregated with all possible combinations of columns.

 SELECT field1,field2,field3,field4 ,duplicate_row_count = COUNT(*) ,grp_id = GROUPING_ID(field1,field2,field3,field4) INTO #duplicate_rows FROM table_name GROUP BY CUBE(field1,field2,field3,field4) HAVING COUNT(*) > 1 AND GROUPING_ID(field1,field2,field3,field4) IN (0,1,2,4,8,3,5,6,9,10,12) 

Numbers (0,1,2,4,8,3,5,6,6,9,10,12) are only bitmasks (0000,0001,0010,0100, ..., 1010,1100) from the group that we interesting - those who have 4, 3 or 2 matches.

Then attach this back to the original table using a method that processes NULL in #duplicate_rows as wildcards

 SELECT a.* FROM table_name a INNER JOIN #duplicate_rows b ON NULLIF(b.field1,a.field1) IS NULL AND NULLIF(b.field2,a.field2) IS NULL AND NULLIF(b.field3,a.field3) IS NULL AND NULLIF(b.field4,a.field4) IS NULL --WHERE grp_id IN (0) --Use this for 4 matches --WHERE grp_id IN (1,2,4,8) --Use this for 3 matches --WHERE grp_id IN (3,5,6,9,10,12) --Use this for 2 matches 
0
source share

All Articles