Removing duplicate rows from a table in DB2 in a single query

I have a table with three columns as shown below:

one | two | three | name ------------------------------------ A1 B1 C1 xyz A1 B1 C1 pqr -> should be deleted A1 B1 C1 lmn -> should be deleted A2 B2 C2 abc A2 B2 C2 def -> should be deleted A3 B3 C3 ghi ------------------------------------ 

There is no primary key column in the table. I have no control over the table, so I cannot add a primary key column.

As shown, I want to delete rows where the combination of one, two and three columns is the same. Therefore, if A1B1C1 occurs three times (as indicated above, for example), the other two should be removed, and only one should remain.

How to do this through a single query in DB2?

My requirement is for one request, since I will run it through a java program.

+7
source share
7 answers

(It is assumed that you are using DB2 for Linux / Unix / Windows, other platforms may vary slightly)

 DELETE FROM (SELECT ROWNUMBER() OVER (PARTITION BY ONE, TWO, THREE) AS RN FROM SESSION.TEST) AS A WHERE RN > 1; 

Gotta get what you are looking for.

The query uses the OLAP ROWNUMBER() function to assign a number to each row in each ONE , TWO , THREE combination. DB2 can then match the rows referenced by fullselect (A) as rows that the DELETE should remove from the table. In order to be able to use fullselect as the target for the delete clause, it must comply with the rules for the view to be deleted (see "Delete view" in the notes section).

Below is some proof (tested on LUW 9.7):

 DECLARE GLOBAL TEMPORARY TABLE SESSION.TEST ( one CHAR(2), two CHAR(2), three CHAR(2), name CHAR(3) ) ON COMMIT PRESERVE ROWS; INSERT INTO SESSION.TEST VALUES ('A1', 'B1', 'C1', 'xyz'), ('A1', 'B1', 'C1', 'pqr'), ('A1', 'B1', 'C1', 'lmn'), ('A2', 'B2', 'C2', 'abc'), ('A2', 'B2', 'C2', 'def'), ('A3', 'B3', 'C3', 'ghi'); DELETE FROM (SELECT ROWNUMBER() OVER (PARTITION BY ONE, TWO, THREE) AS RN FROM SESSION.TEST) AS A WHERE RN > 1; SELECT * FROM SESSION.TEST; 

Edit March 2, 2017:

In response to a question from Ahmed Anwar, if you need to capture what was deleted, you can also combine the deletion using the data change operator . "In this example, you can do something like the following, which will give you the column" rn ", one, two and three:

 SELECT * FROM OLD TABLE ( DELETE FROM (SELECT ROWNUMBER() OVER (PARTITION BY ONE, TWO, THREE) AS RN ,ONE ,TWO ,THREE FROM SESSION.TEST) AS A WHERE RN > 1 ) OLD; 
+17
source
 DELETE FROM the_table tt WHERE EXISTS ( SELECT * FROM the_table ex WHERE ex.one = tt.one AND ex.two = tt.two AND ex.three = tt.three AND ex.zname < tt.zname -- tie-breaker... ); 

Notes: Your SQL dialect may vary. Note2: โ€œnameโ€ is a reserved word on some platforms. Better avoid this.

+2
source

@a_horse_with_no_name answer db2 option for iseries without using the group by clause and in the section. It really works

 DELETE from the_table a where rrn(a) < ( select max(rrn(a)) from the_table b where a.one = b.one and a.two = b.two and a.three = b.three ) 
+1
source
 Please take backup of table before deleting the data Delete from table where Name in (select name from table group by one,two,three having count(*) > 2) 

you can use

  DELETE from TABLE Group by one,two,three Having count(*) > 2; 
0
source
 DELETE FROM Table_Name WHERE Table_Name_ID NOT IN ( SELECT MAX(Table_Name_ID) FROM Table_Name GROUP BY one , two, three ) 

one two three are your repeating columns and Table_Name_ID is PK

0
source

This is a levenlevi answer that does not require a primary key in the table (I canโ€™t check the syntax right now)

 DELETE FROM the_table WHERE rid_bit(the_table) NOT IN (SELECT MAX(rid_bit(the_table)) FROM the_table GROUP BY one,two,three) 

I think iSeries rid_bit() not supported, but rrn() keeps the same purpose

0
source

For others using a very old version of db2 SQL: a combination of these messages helped identify and remove duplicates from two batches sent twice.

 SELECT * FROM LIBRARY.TABLE a WHERE a.batch in (115131, 115287) AND EXISTS ( SELECT 1 from LIBRARY.TABLE d WHERE d.batch in (115131, 115287) AND a.one = d.one AND a.two = d.two AND a.three = d.three GROUP BY d.one, d.two, d.three HAVING count(*) <> 1 ) AND RRN(a) > (SELECT MIN(RRN(b)) FROM LIBRARY.TABLE b WHERE b.batch in (115131, 115287) AND a.one = b.one AND a.two = b.two AND a.three = b.three ); 
0
source

All Articles