NOT IN query ... odd results

I need a list of users in one database that are not listed as new_user_id in another. Both databases have 112,815 users; user_id is the key in all query tables.


Query No. 1 works and gives me 111,327 users that are not referenced as new_user_Id. But for this you need to query the same data twice.

 -- 111,327 GSU users are NOT listed as a CSS new user -- 1,488 GSU users ARE listed as a new user in CSS -- select count(gup.user_id) from gsu.user_profile gup join (select cud.user_id, cud.new_user_id, cud.user_type_code from css.user_desc cud) cudsubq on gup.user_id = cudsubq.user_id where gup.user_id not in (select cud.new_user_id from css.user_desc cud where cud.new_user_id is not null); 


Request number 2 would be perfect ... and I'm really surprised that it is syntactically accepted. But this gives me a result that does not make sense.

 -- This gives me 1,505 users... I've checked, and they are not -- referenced as new_user_ids in CSS, but I don't know why the ones -- that were excluded were excluded. -- -- Where are the missing 109,822, and whatexcluded them? -- select count(gup.user_id) from gsu.user_profile gup join (select cud.user_id, cud.new_user_id, cud.user_type_code from css.user_desc cud) cudsubq on gup.user_id = cudsubq.user_id where gup.user_id not in (cudsubq.new_user_id); 


What is the where clause in the second query, and why does it exclude 109,822 records from the results?


Note The above query simplifies what I'm really doing. There are other / better ways to fulfill the above requests ... they just form part of the request that gives me problems.

+3
source share
4 answers

Read this: http://asktom.oracle.com/pls/asktom/f?p=100:11 index :: NO :: P11_QUESTION_ID : 442029737684

As far as I understand, your cudsubq.new_user_id may be NULL , although both tables are joined by user_id , so you will not get results using the NOT IN operator if the subset contains NULL values. Consider the example in the article:

 select * from dual where dummy not in ( NULL ) 

This does not return any records. Try using the NOT EXISTS statement or just another type of join. Here is a good source: http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html

And you need a fourth example:

 SELECT COUNT(descr.user_id) FROM user_profile prof LEFT OUTER JOIN user_desc descr ON prof.user_id = descr.user_id WHERE descr.new_user_id IS NULL OR descr.new_user_id != prof.user_id 
+4
source

The second query is semantically different. In this case

 where gup.user_id not in (cudsubq.new_user_id) 

cudsubq.new_user_id considered as an expression ( doc: IN condition ), and not as a subquery, so the whole article is basically equivalent

 where gup.user_id != cudsubq.new_user_id 

So, in your first request, you literally ask "to show me all the users in GUP who also have CSS entries and their GUP.ID does not match ANY NOT NULL NEW_ID in CSS".

However, the second request: "Show me all the users in GUP who also have CSS entries and their GUP.ID is not equal to their RESPECTIVE NULLABLE (no is not null , remember?) CSS.NEW_ID value."

And any (not) in checks (not) in or equalities / inequalities) with zeros actually don't work.

 12:07:54 SYSTEM@oars _sandbox> select * from dual where 1 not in (null, 2, 3, 4); no rows selected Elapsed: 00:00:00.00 

Here you lose your lines. I would probably rewrite your second query, where the sentence is like where cudsubq.new_user_id is null , assuming non-shared users have null new_user_id.

+1
source

The second choice compares gup.user_id with cud.new_user_id with the current join record. You can rewrite the query to get the same result.

 select count(gup.user_id) from gsu.user_profile gup join (select cud.user_id, cud.new_user_id, cud.user_type_code from css.user_desc cud) cudsubq on gup.user_id = cudsubq.user_id where gup.user_id != cud.new_user_id or cud.new_user_id is null; 

You mentioned that you are comparing a list of users in one database with a list of users in another. Therefore, you need to request data twice, and you are not requesting the same data. Perhaps you can use the minus operator to avoid using the in

 select count(gup.user_id) from gsu.user_profile gup join (select cud.user_id from css.user_desc cud minus select cud.new_user_id from css.user_desc cud) cudsubq on gup.user_id = cudsubq.user_id; 
0
source

You want new_user_id from the gup table that does not match new_user_id on the cud table, right? This seems to work for a left join:

 SELECT count(gup.user_id) FROM gsu.user_profile gup LEFT JOIN css.user_desc cud ON gup.user_id = cud.new_user_id WHERE cud.new_user_id is NULL 

The union saves all gup lines, matching them with new_user_id , if possible. The WHERE clause contains only lines that do not have a corresponding line in cud .

(Apologies if you already know this, and you are only interested in the behavior of the not in request)

0
source

Source: https://habr.com/ru/post/928053/


All Articles