Oracle Self-Join in several possible column matches - CONNECT BY?

Question

Oracle Self-Join in several possible column matches - CONNECT BY?

I have a request requirement from ----. Trying to solve it with CONNECT BY , but it may not seem like I need the results.

Table (simplified):

 create table CSS.USER_DESC ( USER_ID VARCHAR2(30) not null, NEW_USER_ID VARCHAR2(30), GLOBAL_HR_ID CHAR(8) ) -- USER_ID is the primary key -- NEW_USER_ID is a self-referencing key -- GLOBAL_HR_ID is an ID field from another system

There are two sources of user data (datafeeds) ... I have to keep track of errors in any of them when updating information.

Scenarios:

The user is given a new user ID ... The old record is set accordingly and deactivated (usually a renaming for contractors who become full-time)
The user leaves and returns later. HR cannot send us the old user ID so we can connect accounts.
The system screwed and did not set a new user ID on the old record.
Data can be bad in hundreds of other ways.

I need to know that the same user, and I can not rely on the name or other fields ... they differ between the corresponding entries:

 ROOTUSER NUMROOTS NODELEVEL ISLEAF USER_ID NEW_USER_ID GLOBAL_HR_ID USERTYPE LAST_NAME FIRST_NAME ----------------------------------------------------------------------------------------------------------------------------- EX0T1100 2 1 0 EX0T1100 EX000005 CONTRACTOR VON DER HAAVEN VERONICA EX0T1100 2 2 1 EX000005 00126121 EMPLOYEE HAAVEN, VON DER VERONICA GL110456 1 1 1 GL110456 00126121 EMPLOYEE VONDERHAAVEN VERONICA

EXOT1100 and EX000005 correctly associated with the NEW_USER_ID field. The renaming took place before the global identifiers HR appeared, therefore EX0T1100 does not exist. EX000005 been assigned a new user ID, 'GL110456', and two of them are associated only with the same global HR identifier.

Data cleaning is not an option.

The request so far:

 select connect_by_root cud.user_id RootUser, count(connect_by_root cud.user_id) over (partition by connect_by_root cud.user_id) NumRoots, level NodeLevel, connect_by_isleaf IsLeaf, --connect_by_iscycle IsCycle, cud.user_id, cud.new_user_id, cud.global_hr_id, cud.user_type_code UserType, ccud.last_name, cud.first_name from css.user_desc cud where cud.user_id in ('EX000005','EX0T1100','GL110456') -- Using this so I don't get sub-users in my list of root users... -- It complicates the matches with GLOBAL_HR_ID, however start with cud.user_id not in (select cudsub.new_user_id from css.user_desc cudsub where cudsub.new_user_id is not null) connect by nocycle (prior new_user_id = user_id);

I tried various CONNECT BY clauses, but none of them are right:

 -- As a multiple CONNECT BY connect by nocycle (prior global_hr_id = global_hr_id) connect by nocycle (prior new_user_id = user_id) -- As a compound CONNECT BY connect by nocycle ((prior new_user_id = user_id) or (prior global_hr_id = global_hr_id and user_id != prior user_Id))

UNIONing two CONNECT BY queries do not work ... I am not getting alignment.

Here is what I would like to see ... I'm fine with a set of results that I have to distinguish and use as a subquery. I'm fine with any of the three user IDs in the ROOTUSER column ... I just need to know that they are the same users.

 ROOTUSER NUMROOTS NODELEVEL ISLEAF USER_ID NEW_USER_ID GLOBAL_HR_ID USERTYPE LAST_NAME FIRST_NAME ----------------------------------------------------------------------------------------------------------------------------- EX0T1100 3 1 0 EX0T1100 EX000005 CONTRACTOR VON DER HAAVEN VERONICA EX0T1100 3 2 1 EX000005 00126121 EMPLOYEE HAAVEN, VON DER VERONICA EX0T1100 3 (2 or 3) 1 GL110456 00126121 EMPLOYEE VONDERHAAVEN VERONICA

Ideas?

Update

Nicholas, your code is very similar to the correct track ... at the moment lead(user_id) over (partition by global_hr_id) gets false hits when global_hr_id is NULL. For instance:

 USER_ID NEW_USER_ID CHAINNEWUSER GLOBAL_HR_ID LAST_NAME FIRST_NAME FP004468 FP004469 AARON TIMOTHY FP004469 FOONG KOK WAH

I often wanted to treat zeros as separate entries in a section, but I never found a way to make ignore nulls work. This did what I wanted:

 decode(global_hr_id,null,null,lead(cud.user_id ignore nulls) over (partition by global_hr_id order by user_id)

... but there must be a better way. I could not bring the request to completion of full-scale user data (about 40,000 users). Both global_hr_id and NEW_USER_ID indexed.

Update

The request returns after about 750 seconds ... long but manageable. It returns 93k records because I don’t have a good way to filter out level 2 exits from the root - you have start with global_hr_id is null , but unfortunately this is not always the case. I need to think about how to filter them out.

I tried to add a more complicated start with sentences earlier, but I find it separately, they start <1 second ... together, they take 90 minutes>. <

Thanks again for helping ... evade this.

+3

oracle self-join connect-by

James king Oct 14 '12 at 20:44

source share

1 answer

Nick Krasnov · Accepted Answer · 2012-10-15T02:38:06+0000

You have provided sample data for only one user. It would be better to have a little more. Anyway, let's look at something like this.

 SQL> with user_desc(USER_ID, NEW_USER_ID, GLOBAL_HR_ID)as( 2 select 'EX0T1100', 'EX000005', null from dual union all 3 select 'EX000005', null, 00126121 from dual union all 4 select 'GL110456', null, 00126121 from dual 5 ) 6 select connect_by_root(user_id) rootuser 7 , count(connect_by_root(user_id)) over(partition by connect_by_root(user_id)) numroot 8 , level nodlevel 9 , connect_by_isleaf 10 , user_id 11 , new_user_id 12 , global_hr_id 13 from (select user_id 14 , coalesce(new_user_id, usr) new_user_id1 15 , new_user_id 16 , global_hr_id 17 from ( select user_id 18 , new_user_id 19 , global_hr_id 20 , decode(global_hr_id,null,null,lead(user_id) over (partition by global_hr_id order by user_id)) usr 21 from user_desc 22 ) 23 ) 24 start with global_hr_id is null 25 connect by prior new_user_id1 = user_id 26 ;

Result:

 ROOTUSER NUMROOT NODLEVEL CONNECT_BY_ISLEAF USER_ID NEW_USER_ID GLOBAL_HR_ID -------- ---------- ---------- ----------------- -------- ----------- ------------ EX0T1100 3 1 0 EX0T1100 EX000005 EX0T1100 3 2 0 EX000005 126121 EX0T1100 3 3 1 GL110456 126121

Oracle Self-Join in several possible column matches - CONNECT BY?

More articles: