Oracle: identifying duplicates in a table without an index

When I try to create a unique index in a large table, I get a unique error. A unique index in this case is a composite key of 4 columns.

Is there an effective way to identify duplicates other than:

select col1, col2, col3, col4, count(*) from Table1 group by col1, col2, col3, col4 having count(*) > 1 

The explanation plan above shows a full table scan with an extremely high cost and just wants to find if there is another way.

Thanks!

+6
oracle plsql duplicates
source share
5 answers

First try creating unique code in these four columns. This takes O (n log n) time, but also reduces the time it takes to execute select to O (n log n).

Here you get attached a bit - in any way you crop it, the entire table must be read at least once. Algorithm na & iuml; ve runs in O (n 2 ) time unless the query optimizer is smart enough to create a temporary index / table.

+7
source share

You can use the EXCEPTIONS INTO clause to catch duplicate rows.

If you do not have an EXCLUSION table yet, create it using the provided script:

 SQL> @$ORACLE_HOME/rdbms/admin/ultexcpt.sql 

Now you can try to create a unique constraint like this

 alter table Table1 add constraint tab1_uq UNIQUE (col1, col2, col3, col4) exceptions into exceptions / 

This will result in an error, but now your EXCEPTIONS table contains a list of all rows whose keys contain duplicates identified by ROWID. This gives you a basis for deciding what to do with duplicates (delete, renumber, whatever).

change

As already noted, you have to pay the cost of scanning the table once. This approach gives you a constant set of duplicate rows, and ROWID is the fastest way to access any given row.

+2
source share

Since there is no index in these columns, this query will have to perform a full table scan - there is no other way to do this unless one or more of these columns has already been indexed.

You can create an index as a non-unique index, and then run a query to identify duplicate rows (which should be very fast after creating the index). But I doubt that the joint time it takes to create a non-ideal index after which the query is executed will be less than just starting the query without an index.

+1
source share

In fact, you need to look for a duplicate of each row in the table. There is no way to do this effectively without an index.

+1
source share

I do not think there is, unfortunately, a faster way.

0
source share

All Articles