To N or Not to N, is SYS_OP_C2C the Question; Oracle NVARCHAR Slow?

8 02 2015

February 8, 2015

I was recently contacted about an odd performance issue with a software product named TDM, where certain SQL statements submitted to an Oracle 9.2.x database required roughly 30 seconds to execute when a nearly instantaneous response was expected.  The person who reported the issue to me provided a sample SQL statement that appeared something similar to the following:

SELECT
  DATA
FROM 
  MVIS_DATA 
WHERE 
  TOOLID = N'112' 
  AND DATATYPE = 0 
  AND COMMISSIONID IS NULL 
  AND OPERATIONID IS NULL 
  AND COMMISSIONLISTPOS IS NULL;

In the SQL statement, notice the N character that is immediately before ‘112’ in the SQL statement.  The person indicated that the SQL statement executed quickly if that N character were removed from the SQL statement.  At this time the developer of that application is unwilling to release a bug fix to remove the N character from this (and likely other) SQL statements.

I did not initially have the table datatype descriptions (retrieved with DESC MVIS_DATA), so I made a couple of guesses about the datatypes.  What if the TOOLID column was defined as a number, and is it the primary key column for the table (indicating that there must be an index on that column)?  It might be the case that the developer of the application decided that in all SQL statements that are submitted with literal values (rather than using bind variables), that all numbers would be submitted in single quotes.  I created a testing table for a mock up in Oracle Database 11.2.0.2:

CREATE TABLE MVIS_DATA_NUM (
  TOOLID NUMBER,
  DATATYPE NUMBER,
  COMMISSIONID NUMBER,
  OPERATIONID NUMBER,
  COMMISSIONLISTPOS NUMBER,
  DATA VARCHAR2(100),
  PRIMARY KEY (TOOLID));
 
INSERT INTO
  MVIS_DATA_NUM
SELECT
  ROWNUM TOOLID,
  MOD(ROWNUM,2) DATATYPE,
  NULL COMMISSIONID,
  DECODE(MOD(ROWNUM,2),0,NULL,MOD(ROWNUM,2)) OPERATIONID,
  DECODE(MOD(ROWNUM,2),0,NULL,MOD(ROWNUM,2)) COMMISSIONLISTPOS,
  LPAD('A',100,'A') DATA
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;
 
COMMIT;
 
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'MVIS_DATA_NUM',CASCADE=>TRUE,NO_INVALIDATE=>FALSE)

With the testing table created with 100,000 rows, and statistics gathered for the table and primary key index, I then tried executing a query and retrieving the execution plan for that query so that I could determine if the Predicate Information section of the execution plan provided any clues.  I executed the following, the first SQL statement retrieved one row, and the second SQL statement retrieved the execution plan for the first SQL statement:

SET LINESIZE 140
SET PAGESIZE 1000
 
SELECT
  DATA
FROM 
  MVIS_DATA_NUM
WHERE 
  TOOLID = N'112' 
  AND DATATYPE = 0 
  AND COMMISSIONID IS NULL 
  AND OPERATIONID IS NULL 
  AND COMMISSIONLISTPOS IS NULL;
 
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

The execution plan output is as follows:

SQL_ID  gzzrppktqkbmu, child number 0
-------------------------------------
SELECT   DATA FROM   MVIS_DATA_NUM WHERE   TOOLID = N'112'   AND
DATATYPE = 0   AND COMMISSIONID IS NULL   AND OPERATIONID IS NULL   AND
COMMISSIONLISTPOS IS NULL
 
Plan hash value: 1080991
 
---------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| MVIS_DATA_NUM |     1 |   113 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | SYS_C0050817  |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("OPERATIONID" IS NULL AND "COMMISSIONLISTPOS" IS NULL AND
              "DATATYPE"=0 AND "COMMISSIONID" IS NULL))
   2 - access("TOOLID"=112)

Nothing too unusual in the above execution plan, the N’112′ portion of the SQL statement was simply changed by the query optimizer to the number 112, which permitted the use of the table’s primary key index.  Obviously, Oracle Database 11.2.0.2 contains a few optimizations that are not available in Oracle Database 9.2.x, so maybe the outcome would be different in Oracle Database 9.2.x.  However, when a number value is compared to a character (for example VARCHAR2) value, Oracle Database will attempt to implicitly convert the character value to a number value when performing the comparison, so the outcome should be the same on Oracle Database 9.2.x.

What if that TOOLID column were defined as VARCHAR?  Below is another test table with that column defined as VARCHAR2:

CREATE TABLE MVIS_DATA (
  TOOLID VARCHAR2(15),
  DATATYPE NUMBER,
  COMMISSIONID NUMBER,
  OPERATIONID NUMBER,
  COMMISSIONLISTPOS NUMBER,
  DATA VARCHAR2(100),
  PRIMARY KEY (TOOLID));
 
INSERT INTO
  MVIS_DATA
SELECT
  TO_CHAR(ROWNUM) TOOLID,
  MOD(ROWNUM,2) DATATYPE,
  NULL COMMISSIONID,
  DECODE(MOD(ROWNUM,2),0,NULL,MOD(ROWNUM,2)) OPERATIONID,
  DECODE(MOD(ROWNUM,2),0,NULL,MOD(ROWNUM,2)) COMMISSIONLISTPOS,
  LPAD('A',100,'A') DATA
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;
 
COMMIT;
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'MVIS_DATA',CASCADE=>TRUE,NO_INVALIDATE=>FALSE)

With the new test table created, let’s try the SQL statement again.  A 10053 trace file will be enabled in the event that you are interested in examining any potential automatic transformations of the SQL statement:

SET LINESIZE 140
SET PAGESIZE 1000
 
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'SQL_10053V';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';
 
SELECT
  DATA
FROM 
  MVIS_DATA 
WHERE 
  TOOLID = N'112' 
  AND DATATYPE = 0 
  AND COMMISSIONID IS NULL 
  AND OPERATIONID IS NULL 
  AND COMMISSIONLISTPOS IS NULL;
 
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));
 
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';

The first of the above SQL statements output one row.  Here is the execution plan that was output:

SQL_ID  5pkwzs079jwu2, child number 0
-------------------------------------
SELECT   DATA FROM   MVIS_DATA WHERE   TOOLID = N'112'   AND DATATYPE =
0   AND COMMISSIONID IS NULL   AND OPERATIONID IS NULL   AND
COMMISSIONLISTPOS IS NULL
 
Plan hash value: 353063534
 
-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |       |       |   227 (100)|          |
|*  1 |  TABLE ACCESS FULL| MVIS_DATA |   122 | 13908 |   227   (3)| 00:00:01 |
-------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("OPERATIONID" IS NULL AND "COMMISSIONLISTPOS" IS NULL
              AND "DATATYPE"=0 AND SYS_OP_C2C("TOOLID")=U'112' AND "COMMISSIONID" IS
              NULL))

Notice the SYS_OP_C2C function in the Predicate Information section of the execution plan, that is a clue that there might be a performance problem lurking.  Also notice that the INDEX UNIQUE SCAN operation was replaced with a TABLE ACCESS FULL operation, that is also a clue that a performance problem may be lurking.  This section of the execution plan also indicates that the N’112′ portion of the SQL statement was changed to U’112′.  Consulting the 10053 trace file indicates that the query optimizer rewrote the submitted SQL statement to the following:

SELECT
  "MVIS_DATA"."DATA" "DATA"
FROM
  "TESTUSER"."MVIS_DATA" "MVIS_DATA"
WHERE
  SYS_OP_C2C("MVIS_DATA"."TOOLID")=U'112'
  AND "MVIS_DATA"."DATATYPE"=0
  AND "MVIS_DATA"."COMMISSIONID" IS NULL
  AND "MVIS_DATA"."OPERATIONID" IS NULL
  AND "MVIS_DATA"."COMMISSIONLISTPOS" IS NULL

SYS_OP_C2C is an internal characterset conversion function.

What happens to the execution plan if the N character is removed from the SQL statement?

SELECT
  DATA
FROM 
  MVIS_DATA 
WHERE 
  TOOLID = '112' 
  AND DATATYPE = 0 
  AND COMMISSIONID IS NULL 
  AND OPERATIONID IS NULL 
  AND COMMISSIONLISTPOS IS NULL;
 
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

Below is the execution plan that was output:

SQL_ID  d70jxj3ypy60g, child number 0
-------------------------------------
SELECT   DATA FROM   MVIS_DATA WHERE   TOOLID = '112'   AND DATATYPE =
0   AND COMMISSIONID IS NULL   AND OPERATIONID IS NULL   AND
COMMISSIONLISTPOS IS NULL
 
Plan hash value: 1051843381
 
--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| MVIS_DATA    |     1 |   114 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | SYS_C0050814 |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("OPERATIONID" IS NULL AND "COMMISSIONLISTPOS" IS NULL AND
              "DATATYPE"=0 AND "COMMISSIONID" IS NULL))
   2 - access("TOOLID"='112')

Notice that the SYS_OP_C2C function does not appear in the Predicate Information section of the execution plan this time, and that the primary key index is used, rather than requiring a full table scan.  Unfortunately, the DBMS_XPLAN.DISPLAY_CURSOR function does not exist in Oracle Database 9.2.0.x, otherwise the reason for the performance problem may have been much more readily apparent to the person who reported the issue to me.

So, what is the purpose of that N character in the SQL statement?  I recall seeing SQL statements similar to this one in the past, which converts a character string to a date:

SELECT DATE'2015-02-08' FROM DUAL;
 
DATE'2015
---------
08-FEB-15

After a fair amount of digging through the Oracle documentation, I located the following note about that N character:

“The TO_NCHAR function converts the data at run time, while the N function converts the data at compilation time.”

Interesting.  That quote suggests that the author of the SQL statement may have been trying to convert ‘112’ to a NVARCHAR2 (or NCHAR).  Time for another test, the below script creates a table with the TOOLID column defined as NVARCHAR2, populates the table with 100,000 rows, and then collects statistics on the table and its primary key index:

CREATE TABLE MVIS_DATA_N (
  TOOLID NVARCHAR2(15),
  DATATYPE NUMBER,
  COMMISSIONID NUMBER,
  OPERATIONID NUMBER,
  COMMISSIONLISTPOS NUMBER,
  DATA VARCHAR2(100),
  PRIMARY KEY (TOOLID));
 
INSERT INTO
  MVIS_DATA_N
SELECT
  TO_CHAR(ROWNUM) TOOLID,
  MOD(ROWNUM,2) DATATYPE,
  NULL COMMISSIONID,
  DECODE(MOD(ROWNUM,2),0,NULL,MOD(ROWNUM,2)) OPERATIONID,
  DECODE(MOD(ROWNUM,2),0,NULL,MOD(ROWNUM,2)) COMMISSIONLISTPOS,
  LPAD('A',100,'A') DATA
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;
 
COMMIT;
 
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'MVIS_DATA_N',CASCADE=>TRUE,NO_INVALIDATE=>FALSE)

Executing the query against this table also returns one row:

SELECT
  DATA
FROM 
  MVIS_DATA_N 
WHERE 
  TOOLID = N'112' 
  AND DATATYPE = 0 
  AND COMMISSIONID IS NULL 
  AND OPERATIONID IS NULL 
  AND COMMISSIONLISTPOS IS NULL;
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

The execution plan follows:

SQL_ID  1yuzz9rqkvnpv, child number 0
-------------------------------------
SELECT   DATA FROM   MVIS_DATA_N WHERE   TOOLID = N'112'   AND DATATYPE
= 0   AND COMMISSIONID IS NULL   AND OPERATIONID IS NULL   AND
COMMISSIONLISTPOS IS NULL
 
Plan hash value: 1044325464
 
--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| MVIS_DATA_N  |     1 |   119 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | SYS_C0050815 |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("OPERATIONID" IS NULL AND "COMMISSIONLISTPOS" IS NULL AND
              "DATATYPE"=0 AND "COMMISSIONID" IS NULL))
   2 - access("TOOLID"=U'112')

Notice in the above Predicate Information section that the SYS_OP_C2C function does not appear, and the N’112′ portion of the SQL statement was still changed to U’112′.  The execution plan also shows that the primary key index was used, while a full table scan was required when the TOOLID column was defined as a VARCHAR2.

The person who reported the issue to me later provide the output of DESC MVIS_DATA, which indicated that the TOOLID column was in fact defined as a VARCHAR2 column.  If this person were running a more recent version of Oracle Database, he might be able to create a function based index that uses the SYS_OP_C2C function on the TOOLID column.  Such an index might look something like this:

CREATE INDEX IND_TOOLID_FIX ON MVIS_DATA (SYS_OP_C2C("TOOLID"));

Gathering statistics on the table and its indexes, executing the original SQL statement, and outputting the execution plan:

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'MVIS_DATA',CASCADE=>TRUE,NO_INVALIDATE=>FALSE)
 
SELECT
  DATA
FROM 
  MVIS_DATA 
WHERE 
  TOOLID = N'112' 
  AND DATATYPE = 0 
  AND COMMISSIONID IS NULL 
  AND OPERATIONID IS NULL 
  AND COMMISSIONLISTPOS IS NULL;
 
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

The first of the above queries output the expected one row, while the second query output the following execution plan:

SQL_ID  5pkwzs079jwu2, child number 1
-------------------------------------
SELECT   DATA FROM   MVIS_DATA WHERE   TOOLID = N'112'   AND DATATYPE =
0   AND COMMISSIONID IS NULL   AND OPERATIONID IS NULL   AND
COMMISSIONLISTPOS IS NULL
 
Plan hash value: 1497912695
 
----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| MVIS_DATA      |     1 |   125 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_TOOLID_FIX |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("OPERATIONID" IS NULL AND "COMMISSIONLISTPOS" IS NULL AND "DATATYPE"=0
              AND "COMMISSIONID" IS NULL))
   2 - access("MVIS_DATA"."SYS_NC00007$"=U'112')

In the Predicate Information section of the execution plan, notice the absence of the SYS_OP_C2C function on the TOOLID column – those values are pre-computed in the virtual column (SYS_NC00007$) created for the function based index.  An index range scan is reported in the execution plan, rather than an index unique scan (the function based index, when created, was not declared as unique), so the SQL statement should execute much faster than the roughly 30 seconds required by the SQL statement without the function based index.

So, what are the options that were mentioned above?

  • Have the application programmer fix the SQL statements.
  • Upgrade to a version of Oracle Database (if that version is supported by the application) that supports the SYS_OP_C2C function, and create a function based index using that function.
  • If the TOOLID column only contains numbers, that column should be defined as NUMBER, rather than VARCHAR2.  Consider redefining that column as a NUMBER datatype.
  • If that N character is always used when this column is referenced, that column probably should be defined as NVARCHAR2 rather than VARCHAR2.  Consider redefining that column as a NVARCHAR2 datatype.
  • Consider that the application is working as designed, and that it is nice to receive 30 second breaks every now and then.
  • Take a DUMP and share it with the application developer.  On second thought, such an approach may have limited success in helping to solve this problem.

Here is a little experiment with the DUMP function, which reveals Oracle’s internal representation of data – refer to the Internal datatypes and datatype codes in the Oracle documentation for help in decoding the Typ= values.

SELECT DUMP(112) A FROM DUAL;
 
A
---------------------
Typ=2 Len=3: 194,2,13
 
/* ------------------- */
SELECT DUMP(TO_CHAR(112)) A FROM DUAL;
 
A
---------------------
Typ=1 Len=3: 49,49,50
 
/* ------------------- */
SELECT DUMP('112') A FROM DUAL;
 
A
----------------------
Typ=96 Len=3: 49,49,50
 
/* ------------------- */
SELECT DUMP(N'112') A FROM DUAL;
 
A
----------------------------
Typ=96 Len=6: 0,49,0,49,0,50
 
/* ------------------- */
SELECT DUMP(SYS_OP_C2C('112'))  A FROM DUAL;
 
A
----------------------------
Typ=96 Len=6: 0,49,0,49,0,50
 
/* ------------------- */
SELECT DUMP(TO_NCHAR('112')) A FROM DUAL;
 
A
---------------------------
Typ=1 Len=6: 0,49,0,49,0,50

It is possibly interesting to note that the internal representation for N’112′ is CHAR (or NCHAR), while the internal representation for TO_NCHAR(‘112’) (and TO_NCHAR(112)) is VARCHAR2 (or NVARCHAR2).

This blog’s statistics indicate that the search engine search term Oracle NVARCHAR slow resulted in two page views of this blog yesterday.  I can’t help but wonder if the person who performed that search might have been helped by some of the above analysis.