Oracle - table alias and join null evaluation

I was just trying to give an example to explain how NULL in Oracle can lead to "unexpected" behavior, but I found something that I did not expect ...

Setup:

 create table tabNull (val varchar2(10), descr varchar2(100)); insert into tabNull values (null, 'NULL VALUE'); insert into tabNull values ('A', 'ONE CHAR'); 

This gives what I expected:

 SQL> select * from tabNull T1 inner join tabNull T2 using(val); VAL DESCR DESCR ---------- -------------------- -------------------- A ONE CHAR ONE CHAR 

If I remove the table aliases, I get:

 SQL> select * from tabNull inner join tabNull using(val); VAL DESCR DESCR ---------- -------------------- -------------------- A ONE CHAR ONE CHAR A ONE CHAR ONE CHAR 

and this is completely unexpected for me.

In the plans for the execution of two queries, you can find the reason; with table aliases, Oracle makes a HASH JOIN and then checks T1.val = T2.val :

 ------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 118 | 7 (15)| 00:00:01 | |* 1 | HASH JOIN | | 1 | 118 | 7 (15)| 00:00:01 | | 2 | TABLE ACCESS FULL| TABNULL | 2 | 118 | 3 (0)| 00:00:01 | | 3 | TABLE ACCESS FULL| TABNULL | 2 | 118 | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("T1"."VAL"="T2"."VAL") 

Without aliases, it first filters one occurrence of the table for non-zero values, thus selecting only one row, and then makes CARTESIAN with the second occurrence, thereby giving two rows; even if that is correct, I expect a Cartesian result, but I don't have a line with DESCR = "NULL VALUE".

 -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2 | 118 | 6 (0)| 00:00:01 | | 1 | MERGE JOIN CARTESIAN| | 2 | 118 | 6 (0)| 00:00:01 | |* 2 | TABLE ACCESS FULL | TABNULL | 1 | 59 | 3 (0)| 00:00:01 | | 3 | BUFFER SORT | | 2 | | 3 (0)| 00:00:01 | | 4 | TABLE ACCESS FULL | TABNULL | 2 | | 3 (0)| 00:00:01 | -------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("TABNULL"."VAL" IS NOT NULL) 

Is this somehow correct / expected? Is the Cartesian result even stranger than the number of rows returned? Am I misunderstanding the plans or have I missed something so big that I don’t see?

+8
null sql oracle
source share
4 answers

According to http://docs.oracle.com/javadb/10.10.1.2/ref/rrefsqljusing.html using(val) translates here as ON tabnull.val=tabnull.val So

 select tabNull.*, tabNull.descr from tabNull inner join tabNull on tabNull.val = tabNull.val; 

Further, in order to build a plan, Oracle must [actually] assign different aliases for each JOIN member, but sees no reason to use a second alias anywhere in SELECT and ON. So

 select t1.*, t1.descr from tabNull t1 inner join tabNull t2 on t1.val = t1.val; 

Plan

 -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2 | 28 | 4 (0)| 00:00:01 | | 1 | MERGE JOIN CARTESIAN| | 2 | 28 | 4 (0)| 00:00:01 | |* 2 | TABLE ACCESS FULL | TABNULL | 1 | 14 | 2 (0)| 00:00:01 | | 3 | BUFFER SORT | | 2 | | 2 (0)| 00:00:01 | | 4 | TABLE ACCESS FULL | TABNULL | 2 | | 2 (0)| 00:00:01 | -------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("T1"."VAL" IS NOT NULL) 
+1
source share

EDIT . I say below that the syntax is illegal; on further thought that BS for my part, I don’t know what it really is (I can’t indicate where in the language the definition of aliases is required for self-connection). I still believe that the explanation below is probably correct, whether for the “error” or for the “undefined” behavior, which I mentioned below.

*

The syntax is illegal (you knew this - you were just curious to know what would happen, and if you can figure out the solution). I agree with jarlh that you should have received an error message. Obviously, Oracle did not code it that way.

Since this is an invalid syntax, what you see cannot be called an error (therefore, I disagree with Nick's comment). Undefined behavior - when you use syntax that is not supported by the Oracle language definition, you can get any crazy results for which Oracle does not bear any responsibility.

Well, from this point of view, is there any explanation for what you see? I believe this is truly a Cartesian association, not a union, as Nick suggested.

Put yourself in the shoes of an optimizer. He sees the first table in the FROM list, she scans it, how good it is.

Then it reads the second table and has a list of such columns:

tabNULL.val, tabNULL.descr, tabNULL.val, tabNULL.descr

tabNULL.val = tabNULL.val condition tabNULL.val = tabNULL.val

The optimizer is dumb, it is not smart. He, unlike you, does not understand at this moment that tabNULL intended for two different incarnations of the table. He thinks tabNULL.val on both sides of the equation is the EXACT value, and both of them refer to the first "incarnation" of the table. The only time this fails is that tabNULL.val is NULL, so REWRITES REWRITES with the sentence becomes tabNULL.val IS NOT NULL .

Only the FIRST table is checked for tabNULL.val IS NOT NULL ; the optimizer does not "know" tabNULL.val appears in the list again and may have VARIOUS values! Then a connection occurs; at this moment there are no other conditions, therefore BOTH rows in the second incarnation of the table will create rows in the join, for A, ONE CHAR from the first table.

Then in the projection, only FIRST tabNULL.val will be read tabNULL.val and will fill the BOTH columns at the output. You request that the query mechanism return tabNULL.val twice, and, in your opinion, it is from different places, but there is only one memory cell labeled tabNULL.val and stores what was in the first table.

Of course, very few know with certainty what the optimizer and query mechanism do, but in this case I think this is a pretty safe assumption.

0
source share

The USING keyword is new to me, but according to what I read, this is just a new way to simplify the syntax of an SQL join. (See Oracle USING the keyword )

select * from tabNull T1 inner join tabNull T2 using(val);
equivalent to:
select * from tabNull T1 inner join tabNull T2 on T1.val = T2.val;

select * from tabNull inner join tabNull using(val);
equivalent to:
select * from tabNull inner join tabNull on tabNull.val = tabNull.val;

The problem is that in the second query, the table names in the connection tabNull.val = tabNull.val not unique.

This is bad syntax that would lead to an error if traditional join syntax was used.

I assume that Oracle completed a full cross-product on two tables (which doubled all the rows) and then eliminated the zeros because USING should use equijoins (ie equals " = ") and null not equal to anything.

0
source share

Sorry, I do not think this is really the answer. This is basically just a comment / response to this in your post:

Is the Cartesian result even stranger than the number of rows returned?

Each step of the plan has a "projection", which is a list of columns / expressions that are output from the step. What happens is that identical aliases cause the Oracle projection to combine what should be two columns projected into only one column.

This is easier to see if you use two separate tables in your example and add a pair of columns with a unique name to see what happens, for example:

 create table tabNull1 (val varchar2(10), descr varchar2(100), t1_real_descr varchar2(100) ); insert into tabNull1 values (null, 'T1-NULL VALUE', 'T1-NULL VALUE'); insert into tabNull1 values ('A', 'T1-ONE CHAR', 'T1-ONE CHAR'); create table tabNull2 (val varchar2(10), descr varchar2(100), t2_real_descr varchar2(100) ); insert into tabNull2 values (null, 'T2-NULL VALUE', 'T2-NULL VALUE'); insert into tabNull2 values ('A', 'T2-ONE CHAR', 'T2-ONE CHAR'); select * from tabNull1 t inner join tabNull2 t using(val); VAL DESCR T1_REAL_DESCR DESCR_1 T2_REAL_DESCR ------ ---------------- ----------------- ------------- ----------------- A T2-ONE CHAR T1-NULL VALUE T2-ONE CHAR T2-ONE CHAR A T2-ONE CHAR T1-ONE CHAR T2-ONE CHAR T2-ONE CHAR 

As you can see, your theory of the Cartesian conjunction was correct.

0
source share

All Articles