MySQL - duplication elimination and saving valuable data?

Scenario: I have few duplicate contacts in the table. Duplicates are identified, I can simply delete them, but the problem is that I do not want to lose the data that the duplicate may have, but the original is not. Any tips?

Sample data:

ID Name Email School Dupe_Flag Key 1 AAA a@a X 1 2 AAB JKL 1 3 BBB b@b MNO X 2 4 BBC 2 

Required Conclusion:

 ID Name Email School Dupe_Flag Key 1 AAA a@a X 1 2 AAB a@a JKL 1 3 BBB b@b MNO X 2 4 BBC b@b MNO 2 

How are 2 records related ?: Both of them have the same key value with only one column with Dupe_Flag SET, which is a duplicate column.

In the above case, ID 1 will be deleted, but the email address from ID 1 must be applied to ID 2.

What is data ?: I have several hundred rows and several 100 duplicates. The UPDATE statement for each row is cumbersome and impossible.

Business rules for determining which data takes precedence:

If the column from the original / good record (Dupe_Flag is NOT set) does not have data, and if there is data in the corresponding Dupe column (with the same key value), then this original record column should be updated.

Any help / script really appreciated! Thanks guys:)

+7
mysql duplicates
source share
4 answers

Assuming the nulls are zero, something like this should output the necessary data:

 SELECT a.ID, IF(a.DupeFlag IS NULL, IF(a.Name IS NULL, b.Name, a.Name), a.Name) AS Name, IF(a.DupeFlag IS NULL, IF(a.Email IS NULL, b.Email, a.Email), a.Email) AS Email, IF(a.DupeFlag IS NULL, IF(a.School IS NULL, b.School, a.School), a.School) as School, a.DupeFlag, a.key FROM table a, table b WHERE a.Key = b.Key AND a.ID != b.ID GROUP BY a.ID 

Note that including this in an UPDATE statement is pretty straightforward

+2
source share

I don’t know the specifics of this problem, but it is probably best to avoid this problem by setting the columns to β€œunique”, so if the request tries to create a duplicate, it will fail. I think an elegant solution to this problem is to avoid this at the time of data entry.

I like to use this query to track duplicates:

 select * from table group by `Email` having count(Email) > 1 
0
source share

While this uses a bunch of nested SELECTS and is not a complete solution, it should either ignite something else, or perhaps push in the right direction.

 select * from (select r1.ID,r1.Name,coalesce(r1.Email,r2.Email) as Email, coalesce(r1.School,r2.School) as School,r1.Dupe_Flag,r1.Key from (select * from test1 where Dupe_Flag IS NULL) as r1 left outer join (select * from test1 where Dupe_Flag IS NOT NULL) as r2 on r1.KEY=r2.Key) as results 

Productivity:

 ID Name Email School Dupe_Flag Key 2 AAB a@a JKL NULL 1 4 BBC b@b MNO NULL 2 

Based on the data from your example.

0
source share

The lines are unique, so there are no problems. Repeat the verification of the sample data.

-one
source share

All Articles