I have a request requirement from ----. Trying to solve it with CONNECT BY , but it may not seem like I need the results.
Table (simplified):
create table CSS.USER_DESC ( USER_ID VARCHAR2(30) not null, NEW_USER_ID VARCHAR2(30), GLOBAL_HR_ID CHAR(8) )
There are two sources of user data (datafeeds) ... I have to keep track of errors in any of them when updating information.
Scenarios:
- The user is given a new user ID ... The old record is set accordingly and deactivated (usually a renaming for contractors who become full-time)
- The user leaves and returns later. HR cannot send us the old user ID so we can connect accounts.
- The system screwed and did not set a new user ID on the old record.
- Data can be bad in hundreds of other ways.
I need to know that the same user, and I can not rely on the name or other fields ... they differ between the corresponding entries:
ROOTUSER NUMROOTS NODELEVEL ISLEAF USER_ID NEW_USER_ID GLOBAL_HR_ID USERTYPE LAST_NAME FIRST_NAME ----------------------------------------------------------------------------------------------------------------------------- EX0T1100 2 1 0 EX0T1100 EX000005 CONTRACTOR VON DER HAAVEN VERONICA EX0T1100 2 2 1 EX000005 00126121 EMPLOYEE HAAVEN, VON DER VERONICA GL110456 1 1 1 GL110456 00126121 EMPLOYEE VONDERHAAVEN VERONICA
EXOT1100 and EX000005 correctly associated with the NEW_USER_ID field. The renaming took place before the global identifiers HR appeared, therefore EX0T1100 does not exist. EX000005 been assigned a new user ID, 'GL110456', and two of them are associated only with the same global HR identifier.
Data cleaning is not an option.
The request so far:
select connect_by_root cud.user_id RootUser, count(connect_by_root cud.user_id) over (partition by connect_by_root cud.user_id) NumRoots, level NodeLevel, connect_by_isleaf IsLeaf, --connect_by_iscycle IsCycle, cud.user_id, cud.new_user_id, cud.global_hr_id, cud.user_type_code UserType, ccud.last_name, cud.first_name from css.user_desc cud where cud.user_id in ('EX000005','EX0T1100','GL110456') -- Using this so I don't get sub-users in my list of root users... -- It complicates the matches with GLOBAL_HR_ID, however start with cud.user_id not in (select cudsub.new_user_id from css.user_desc cudsub where cudsub.new_user_id is not null) connect by nocycle (prior new_user_id = user_id);
I tried various CONNECT BY clauses, but none of them are right:
-- As a multiple CONNECT BY connect by nocycle (prior global_hr_id = global_hr_id) connect by nocycle (prior new_user_id = user_id) -- As a compound CONNECT BY connect by nocycle ((prior new_user_id = user_id) or (prior global_hr_id = global_hr_id and user_id != prior user_Id))
UNIONing two CONNECT BY queries do not work ... I am not getting alignment.
Here is what I would like to see ... I'm fine with a set of results that I have to distinguish and use as a subquery. I'm fine with any of the three user IDs in the ROOTUSER column ... I just need to know that they are the same users.
ROOTUSER NUMROOTS NODELEVEL ISLEAF USER_ID NEW_USER_ID GLOBAL_HR_ID USERTYPE LAST_NAME FIRST_NAME ----------------------------------------------------------------------------------------------------------------------------- EX0T1100 3 1 0 EX0T1100 EX000005 CONTRACTOR VON DER HAAVEN VERONICA EX0T1100 3 2 1 EX000005 00126121 EMPLOYEE HAAVEN, VON DER VERONICA EX0T1100 3 (2 or 3) 1 GL110456 00126121 EMPLOYEE VONDERHAAVEN VERONICA
Ideas?
Update
Nicholas, your code is very similar to the correct track ... at the moment lead(user_id) over (partition by global_hr_id) gets false hits when global_hr_id is NULL. For instance:
USER_ID NEW_USER_ID CHAINNEWUSER GLOBAL_HR_ID LAST_NAME FIRST_NAME FP004468 FP004469 AARON TIMOTHY FP004469 FOONG KOK WAH
I often wanted to treat zeros as separate entries in a section, but I never found a way to make ignore nulls work. This did what I wanted:
decode(global_hr_id,null,null,lead(cud.user_id ignore nulls) over (partition by global_hr_id order by user_id)
... but there must be a better way. I could not bring the request to completion of full-scale user data (about 40,000 users). Both global_hr_id and NEW_USER_ID indexed.
Update
The request returns after about 750 seconds ... long but manageable. It returns 93k records because I donβt have a good way to filter out level 2 exits from the root - you have start with global_hr_id is null , but unfortunately this is not always the case. I need to think about how to filter them out.
I tried to add a more complicated start with sentences earlier, but I find it separately, they start <1 second ... together, they take 90 minutes>. <
Thanks again for helping ... evade this.