SQL Deduplicate Tuple List

I have a table with two columns of identifiers, for example:

โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•ฆโ•โ•โ•โ•โ•โ•โ•—
โ•‘ Master โ•‘ Dupe โ•‘
โ• โ•โ•โ•โ•โ•โ•โ•โ•โ•ฌโ•โ•โ•โ•โ•โ•โ•ฃ
โ•‘ 2      โ•‘ 7    โ•‘
โ•‘ 3      โ•‘ 6    โ•‘
โ•‘ 6      โ•‘ 7    โ•‘
โ•‘ 20     โ•‘ 25   โ•‘
โ•‘ 75     โ•‘ 25   โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•ฉโ•โ•โ•โ•โ•โ•โ•

Each row represents the identifiers of two rows in the sql table, which are considered duplicates of each other.

This table can contain many thousands of records, without any guarantee for data other than a column Mastersorted in ascending order, as shown in the figure. Any column may contain the same identifier as another column, potentially for a different or the same partner identifier. Again, no guarantees.

From this table, I would like to get the index of the Master and all its possible frauds. As shown in the picture below.

Desired Results:

  • The smallest identifier should be stored as a master
  • ( ID)

( ):

โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•ฆโ•โ•โ•โ•โ•โ•โ•—
โ•‘ Master โ•‘ Dupe โ•‘
โ• โ•โ•โ•โ•โ•โ•โ•โ•โ•ฌโ•โ•โ•โ•โ•โ•โ•ฃ
โ•‘ 2      โ•‘ 3    โ•‘
โ•‘ 2      โ•‘ 6    โ•‘
โ•‘ 2      โ•‘ 7    โ•‘
โ•‘ 20     โ•‘ 25   โ•‘
โ•‘ 20     โ•‘ 75   โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•ฉโ•โ•โ•โ•โ•โ•โ•

, . , , , .

!

EDIT: , , .

, ,

  • . , .
  • ,

, , , , LukStorms , .

: , @artm @LukStorms, , , . ! . , .

+6
3

, CTE .

, , DUPES CTE.

declare @DuplicateTest table (Master int, Dupe int);

insert into @DuplicateTest (Master, Dupe) values 
(3,6),(6,7),(2,7),
(20,25),(75,25);

;with DUPES as
(
     select distinct Master as Dupe1, Dupe as Dupe2 from @DuplicateTest
     union
     select distinct Dupe, Master from @DuplicateTest
)
,RCTE as
(
   select Dupe1 as Base, 0 as Level, Dupe1, Dupe2
   from DUPES

   union all

   select r.Base, (r.Level + 1), d.Dupe1, d.Dupe2
   from RCTE r
   join DUPES d on (r.Dupe2 = d.Dupe1 
                    and r.Dupe1 != d.Dupe2 -- don't loop on the reverse
                    and r.Base != d.Dupe2 -- don't repeat what we started from
                    and r.Level < 100) -- if the level gets to big it most likely a loop
)
select min(Dupe2) as Master, Base as Dupe
from RCTE
group by Base
having Base > min(Dupe2)
order by Base;
+2

. min master CTE .

;WITH minmaster as (select MIN(MASTER) master
FROM myTable)
select distinct m.master
, i.dupe
from minmaster m 
cross join (select dupe dupe from myTable union all select master from myTable) i
WHERE i.dupe <> m.master

Update:

, , . - ( ), , , , . , - .

;WITH myTable AS 
(SELECT 2 MASTER, 7 dupe
UNION all SELECT 3, 6
UNION all SELECT 6, 7
UNION all SELECT 20, 25
UNION all SELECT 75, 25
UNION all SELECT 100, 125
UNION all SELECT 150, 300
UNION all SELECT 180, 300
)
, cte AS 
(
SELECT m.master L, m.dupe R, ROW_NUMBER() OVER (ORDER BY master) rnkC
FROM myTable m
)
, cte2 AS 
(
SELECT m.master L, m.dupe R, ROW_NUMBER() OVER (ORDER BY master) rnkC2
FROM myTable m
)
, cteCur AS 
(
SELECT TOP 1 cte.l, cte.R, cte.rnkC
FROM cte
UNION ALL
SELECT 
CASE WHEN cteCur.r IN (SELECT dupe 
                        FROM myTable 
                        WHERE MASTER <> cteCur.L AND dupe = cteCur.R) 
    THEN cteCur.L 
    ELSE (SELECT cte2.L 
            FROM cte2 
            WHERE cte2.rnkC2 = cteCur.rnkC + 1) 
    END
, CASE WHEN cteCur.r IN (SELECT dupe 
                            FROM myTable 
                            WHERE MASTER <> cteCur.L AND dupe = cteCur.R) 
        THEN (SELECT cte2.L 
                FROM cte2 
                WHERE cte2.R = cteCur.R AND cte2.L <> cteCur.L) 
        ELSE (SELECT cte2.R 
                FROM cte2 
                WHERE cte2.rnkC2 = cteCur.rnkC + 1) 
        END
, cteCur.rnkC + 1
FROM cteCur
WHERE cteCur.L IS NOT NULL
)
SELECT cteCur.L Master
, cteCur.R Dupe
FROM cteCur
WHERE L IS NOT NULL
ORDER BY L, R
+4

, , , - . , , , UnionFind. , , ...

Googling SQL,

+1

All Articles