Translate

NOT NULL and INDEXED Column

How INDEXED Columns behaves when not defined as NOT NULL?

Sometimes, while Declaring Table columns we do not declare them as NOT NULL even if we know that column will never be NULL. In such cases Optimizer has to assume that column can accept NULL values while generating Execution Plan which results in Full Table Scan.

Let’s check this with the help of below Example:
I have a Table below without any Constraint and Index.
SQL> DESC EMPLOYEE_DATA;
 Name                                                                    Null?    Type
 ----------------------------------------------------------------------- -------- ------------------------------------------------
 EMP_NO                                                                           NUMBER
 ENAME                                                                            VARCHAR2(10)
 DEPTNO                                                                           NUMBER
 HIREDATE                                                                         DATE
 DNAME                                                                            VARCHAR2(10)
 DLOC                                                                             VARCHAR2(10)
 SALARY                                                                           NUMBER(8,2)
 JOB_ID                                                                           VARCHAR2(20)

SQL> SET AUTOTRACE TRACENLY EXP
SQL> SELECT * From EMPLOYEE_DATA Where HIREDATE IS NULL;
Execution Plan
----------------------------------------------------------
Plan hash value: 4276744253
-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |               |     1 |    81 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| EMPLOYEE_DATA |     1 |    81 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("HIREDATE" IS NULL)

Note
-----
   - dynamic sampling used for this statement (level=2)
In above statement, Full Access path was followed which was obvious and expected that Optimizer will use Full Table Scan as there were no Index no Constraint been used.


Let’s create an Index on Above Table assuming that now Optimizer will use Index Scan Path.

SQL> CREATE INDEX INDX_HD_NAME ON EMPLOYEE_DATA(HIREDATE,ENAME);
Index created.

SQL> SELECT * From EMPLOYEE_DATA Where HIREDATE IS NULL;
Execution Plan
----------------------------------------------------------
Plan hash value: 4276744253
-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |               |     1 |    81 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| EMPLOYEE_DATA |     1 |    81 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("HIREDATE" IS NULL)
Note
-----
   - dynamic sampling used for this statement (level=2)


So, even after creating Index on Column it has followed Full Scan access path. Why?
The above index supports the query only if the column ENAME has a NOT NULL constraint. 

Removing NOT NULL Constraint renders the Index Unusable state for the above Query.

Let’s try declaring ENAME column as NOT NULL and see how it helps in generating better execution plan.
SQL> Alter table EMPLOYEE_DATA modify ENAME NOT NULL;
Table altered.

SQL> DESC EMPLOYEE_DATA;
 Name                                                                    Null?    Type
 ----------------------------------------------------------------------- -------- ------------------------------------------------
 EMP_NO                                                                           NUMBER
 ENAME                                                                   NOT NULL VARCHAR2(10)
 DEPTNO                                                                           NUMBER
 HIREDATE                                                                         DATE
 DNAME                                                                            VARCHAR2(10)
 DLOC                                                                             VARCHAR2(10)
 SALARY                                                                           NUMBER(8,2)
 JOB_ID                                                                           VARCHAR2(20)

SQL> SELECT * From EMPLOYEE_DATA Where HIREDATE IS NULL;
Execution Plan
----------------------------------------------------------
Plan hash value: 1371098230
---------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |     1 |    81 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMPLOYEE_DATA |     1 |    81 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | INDX_HD_NAME  |     2 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("HIREDATE" IS NULL)
Note
-----
   - dynamic sampling used for this statement (level=2)

As you can see Optimizer has followed Index Scan path as Optimizer doesn’t need to Assume about NULLs.


Let’s do the same Exercise with Function returning your Table column instead of directly using them in your Query.
SQL> Create Index INDX_EMPNO ON EMPLOYEE_DATA(EMP_NO);
Index created.

SQL> CREATE OR REPLACE FUNCTION RETURN_EMPNO(EID IN NUMBER) RETURN NUMBER
DETERMINISTIC
IS BEGIN
RETURN EID;
END;
  /
Function created.

You will need to create DETERMINISITC Function when you know that function will always return Unique value. If we skip DETERMINISTIC in above function, then you will receive below error while referring it creating Index.

ORA-30553: The function is not deterministic


If the PL/SQL function is truly deterministic  (it always returns a unique value for all input values) then you can just add the "deterministic" keyword to the function definition. If the function is deterministic, mark it DETERMINISTIC. If it is not deterministic (it depends on package state, database state, current time, or anything other than the function inputs) then do not create the index. The values returned by a deterministic function should not change even when the function is rewritten or recompiled.
SQL> Drop Index INDX_HD_NAME;
Index dropped.
SQL> CREATE INDEX INDX_HD_NAME ON EMPLOYEE_DATA(HIREDATE,RETURN_EMPNO(EMP_NO));
Index created.

SQL> SET AUTOTRACE TRACEONLY EXP
SQL> SELECT * From EMPLOYEE_DATA Where HIREDATE IS NULL;
Execution Plan
----------------------------------------------------------
Plan hash value: 4276744253
-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |               |     1 |    81 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| EMPLOYEE_DATA |     1 |    81 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("HIREDATE" IS NULL)
Note
-----
   - dynamic sampling used for this statement (level=2)

We know that Function name RETURN_EMPNO will always return Unique and NOT NULL Values but Optimizer doesn’t have any idea what this function does. For Database, this is just a function which returns a Number. Although, we have Index created on EMP_NO but still Database can not use the Index for the Query.

So, if you know that Function never returns NULL, then you can change the Query to make use of Created Index by Optimizer.
SQL> SELECT * From EMPLOYEE_DATA Where HIREDATE IS NULL AND RETURN_EMPNO(EMP_NO) IS NOT NULL;
Execution Plan
----------------------------------------------------------
Plan hash value: 1371098230
---------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |     1 |    94 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMPLOYEE_DATA |     1 |    94 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | INDX_HD_NAME  |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("HIREDATE" IS NULL)
       filter("MYUSER"."RETURN_EMPNO"("EMP_NO") IS NOT NULL)
Note
-----
   - dynamic sampling used for this statement (level=2)



The extra condition in the where clause is always true and therefore does not change the result. Nevertheless, the Oracle database recognizes that you only query rows that must be in the index per definition.




1 comment: