Measuring Numbers – Is this a Valid Comparison?

4 06 2010

June 4, 2010

I encountered an interesting test case in the “Oracle SQL Recipes” book, but I fear that my degree in mathematics is causing me to fail to fully comprehend the test case.  I developed a parallel test case that possibly answers the questions that are left unanswered.  Here is my test case:

CREATE TABLE T8 (
  NUMBER_DIV NUMBER,
  BIN_DBL_DIV BINARY_DOUBLE,
  NUMBER_VALUE NUMBER,
  BIN_DBL_VALUE BINARY_DOUBLE,
  NUMBER_VALUE2 NUMBER(7,2));

INSERT INTO
  T8
SELECT
  ROWNUM,
  ROWNUM,
  1000/ROWNUM,
  1000/ROWNUM,
  1000/ROWNUM
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

COMMIT;

COLUMN NUMBER_DIV FORMAT 99999
COLUMN BIN_DBL_DIV FORMAT 99999
COLUMN NUMBER_VALUE FORMAT 99990.00000000000000000000
COLUMN BIN_DBL_VALUE FORMAT 99990.00000000000000000000
COLUMN NUMBER_VALUE2 FORMAT 99990.000
COLUMN VND FORMAT 999
COLUMN VBDD FORMAT 9999
COLUMN VNV FORMAT 999
COLUMN VBDV FORMAT 9999
COLUMN VNV2 FORMAT 9999

SET LINESIZE 140
SET TRIMSPOOL ON
SET PAGESIZE 1000
SPOOL NUMBERTEST.TXT

SELECT
  NUMBER_DIV,
  BIN_DBL_DIV,
  NUMBER_VALUE,
  BIN_DBL_VALUE,
  NUMBER_VALUE2,
  VSIZE(NUMBER_DIV) VND,
  VSIZE(BIN_DBL_DIV) VBDD,
  VSIZE(NUMBER_VALUE) VNV,
  VSIZE(BIN_DBL_VALUE) VBDV,
  VSIZE(NUMBER_VALUE2) VNV2
FROM
  T8
ORDER BY
  NUMBER_DIV;

SPOOL OFF

Quoting from page 190 of the book:

With the value one-third stored in each column, we can use the VSIZE function to show it was much more complicated to store this [the value of 1/3] with  decimal precision [using the NUMBER datatype], taking nearly three times the space [when compared to the BINARY_DOUBLE datatype].

Here is the output from my script for the row containing the value one-third:

NUMBER_DIV BIN_DBL_DIV            NUMBER_VALUE           BIN_DBL_VALUE NUMBER_VALUE2  VND  VBDD  VNV  VBDV  VNV2
---------- ----------- ----------------------- ----------------------- ------------- ---- ----- ---- ----- -----
      3000        3000      0.3333333333333333      0.3333333333333333         0.330    2     8   21     8     2

As the book states, the column with the NUMBER datatype requires 21 bytes, while the column with the BINARY_DOUBLE datatype requires just 8 bytes to store the value one-third in the table.  What, if anything,  is wrong with the comparison?

Hint: To conserve space, the column format for the NUMBER_VALUE and BIN_DBL_VALUE columns in the above output was changed from:

99990.00000000000000000000
to:
99990.0000000000000000

There is an interesting description of the NUMBER and BINARY_DOUBLE (or other similar datatypes) datatypes in the book “Troubleshooting Oracle Performance“.


Actions

Information

4 responses

4 06 2010
Centinul

“As the book states, the column with the NUMBER datatype requires 21 bytes, while the column with the BINARY_DOUBLE datatype requires just 8 bytes to store the value one-third in the table. What, if anything, is wrong with the comparison?”

I would say that there are possibly two things wrong with this comparison.

1. The book has chosen a single value to compare sizes instead of looking at the range of values. Oracle’s NUMBER datatype is of variable length. Based on the test case presented the NUMBER_VALUE column in some cases will use as little as 2 bytes while the BINARY_DOUBLE remains fixed at 8 bytes. Depending on your needs the NUMBER datatype could be more efficient then a BINARY_DOUBLE.

2. I don’t think it’s fair to compare a NUMBER to a BINARY_DOUBLE. The default NUMBER has much more precision then anything a BINARY_DOUBLE can provide. The cost of that extra precision is more storage as shown in the test case. A 64-bit double has approximately 16 digits of precision as shown when using an unmodified column format:

NUMBER_DIV BIN_DBL_DIV                NUMBER_VALUE               BIN_DBL_VALUE               NUMBER_VALUE2  VND  VBDD  VNV  VBDV  VNV2
---------- ----------- --------------------------- --------------------------- --------------------------- ---- ----- ---- ----- -----
      3000        3000      0.33333333333333333333      0.33333333333333331000      0.33000000000000000000    2     8   21     8     2

Depending on the requirements, and precision needed, one could say that NUMBER_VALUE does not equal BIN_DBL_VALUE so what value is there in comparing the storage taken up by each?

I think a more appropriate comparison in this case since the maximum value is 1000 would be a BINARY_DOUBLE to a NUMBER(20,16). This will give a very similar precision to that of a BINARY_DOUBLE. When you compare the size of a NUMBER(20,16) to a BINARY_DOUBLE in this case there is only a one byte difference. As shown:

NUMBER_DIV BIN_DBL_DIV                NUMBER_VALUE               BIN_DBL_VALUE               NUMBER_VALUE2  VND  VBDD  VNV  VBDV  VNV2
---------- ----------- --------------------------- --------------------------- --------------------------- ---- ----- ---- ----- -----
      3000        3000      0.33333333333333333333      0.33333333333333331000      0.33333333333333330000    2     8   21     8     9

In this case NUMBER_VALUE2 was changed to NUMBER(20,16)

5 06 2010
Charles Hooper

Centinul,

I think that you just demonstrated one of the advantages of this type of blog article.

The hint about changing the column format for the NUMBER_VALUE and BIN_DBL_VALUE columns to save space was true, but you pinpointed the reason why I selected to to strip exactly 4 decimal positions from the column format – there are supposed to be an infinite number of “3” digits after the decimal point, and as your comment shows the next digit is a “1” digit for the BINARY_DOUBLE column. As you stated, we reached the limit of the 8 byte (64 bit) storage capacity (precision/scale) for the column. The column with the NUMBER datatype stored even more “3” digits:

COLUMN NUMBER_VALUE FORMAT 99990.00000000000000000000000000000000000000000
COLUMN BIN_DBL_VALUE FORMAT 99990.00000000000000000000000000000000000000000
 
SELECT
  NUMBER_VALUE,
  BIN_DBL_VALUE
FROM
  T8
WHERE
  NUMBER_DIV=3000;
 
                               NUMBER_VALUE
-------------------------------------------
0.33333333333333333333333333333333333333330
 
                              BIN_DBL_VALUE
-------------------------------------------
0.33333333333333331000000000000000000000000

Great idea to change the NUMBER_VALUE2 column to a NUMBER(20,16) datatype – I did not consider that approach in part because I did not think that the storage requirement would drop below roughly 21 bytes (I was wrong, obviously).

Curious now, how was Oracle able to store that many digits in the NUMBER(20,16) column in just 9 bytes?

SELECT
  DUMP(NUMBER_VALUE),
  DUMP(BIN_DBL_VALUE),
  DUMP(NUMBER_VALUE2)
FROM
  T8
WHERE
  NUMBER_DIV=3000;

The output, slightly cleaned up for comparison:

DUMP(NUMBER_VALUE):  Typ=2   Len=21: 192,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34
DUMP(BIN_DBL_VALUE): Typ=101 Len=8:  191,213,85,85,85,85,85,85
DUMP(NUMBER_VALUE2): Typ=2   Len=9:  192,34,34,34,34,34,34,34,34

Now the next question is: What would be a valid comparison of the NUMBER and BINARY_DOUBLE datatypes? Would we need to consider the possibility that the BINARY_DOUBLE datatype column might be joined to or compared to a NUMBER column in another table? Would we need to consider what would happen when the BINARY_DOUBLE datatype column appears in the WHERE clause and a bind variable with a NUMBER datatype was on the other side of the equal sign? What if the BINARY_DOUBLE datatype column was indexed?

5 06 2010
Centinul

“Would we need to consider the possibility that the BINARY_DOUBLE datatype column might be joined to or compared to a NUMBER column in another table?”

Yes, see below.

“Would we need to consider what would happen when the BINARY_DOUBLE datatype column appears in the WHERE clause and a bind variable with a NUMBER datatype was on the other side of the equal sign?”

Yes, see below.

What if the BINARY_DOUBLE datatype column was indexed?

Generally speaking, NUMBER to BINARY_DOUBLE results in the same implicit conversion issues that characters to numbers do as mentioned in your previous blog article “True or False – Why Isn’t My Index Getting Used?”

From the SQL Reference (http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/sql_elements001.htm#i156865):

“If any of the operands is BINARY_DOUBLE, then Oracle attempts to convert all the operands implicitly to BINARY_DOUBLE before performing the operation.”

I did a quick and dirty test. Here is the setup.

Version:

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
PL/SQL Release 10.2.0.1.0 - Production
CORE	10.2.0.1.0	Production
TNS for Linux: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

Indexes / Bind Variables:

VAR bin_dbl BINARY_DOUBLE;
VAR num_val NUMBER;

BEGIN
     :bin_dbl := TO_BINARY_DOUBLE(1/3);
     :num_val := TO_NUMBER(1/3);
END;
/

CREATE INDEX bin_dbl_idx ON t8(bin_dbl_value);
CREATE INDEX num_val_idx ON t8(number_value);

Query #1: Comparing NUMBER_VALUE column to BINARY_DOUBLE bind variable

SQL> SELECT
  2    NUMBER_VALUE,
  3    BIN_DBL_VALUE
  4  FROM
  5    T8
  6  WHERE NUMBER_VALUE = :bin_dbl;

               NUMBER_VALUE               BIN_DBL_VALUE
--------------------------- ---------------------------
     0.33333333333333333333      0.33333333333333331000

SQL> 
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL, NULL, 'LAST'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  2cy87y57wtwz7, child number 0
-------------------------------------
SELECT   NUMBER_VALUE,   BIN_DBL_VALUE FROM   T8 WHERE NUMBER_VALUE =
:bin_dbl

Plan hash value: 3870692729

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |    16 (100)|          |
|*  1 |  TABLE ACCESS FULL| T8   |     1 |    30 |    16   (7)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(TO_BINARY_DOUBLE("NUMBER_VALUE")=:BIN_DBL)

As suspected Oracle converted the NUMBER_VALUE to a BINARY_DOUBLE, therefore avoiding indexed access

Query #2 Comparing BIN_DBL_VALUE to a NUMBER bind variable

SQL> SELECT
  2    NUMBER_VALUE,
  3    BIN_DBL_VALUE
  4  FROM
  5    T8
  6  WHERE BIN_DBL_VALUE = :num_val;

               NUMBER_VALUE               BIN_DBL_VALUE
--------------------------- ---------------------------
     0.33333333333333333333      0.33333333333333331000

SQL> 
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL, NULL, 'LAST'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  5f560xuqruj7u, child number 0
-------------------------------------
SELECT   NUMBER_VALUE,   BIN_DBL_VALUE FROM   T8 WHERE BIN_DBL_VALUE = :num_val

Plan hash value: 3130934385

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |       |       |     2 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T8          |     1 |    30 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | BIN_DBL_IDX |     1 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("BIN_DBL_VALUE"=TO_BINARY_DOUBLE(:NUM_VAL))

Same conversion applies but this time not to the column, allowing for indexed access.

Query #3 Joining tables together on a BINARY_DOUBLE to NUMBER join condition.

SQL> SELECT
  2    T8_1.NUMBER_VALUE,
  3    T8_1.BIN_DBL_VALUE
  4  FROM
  5    T8 T8_1
  6  , T8 T8_2
  7  WHERE T8_1.BIN_DBL_VALUE = T8_2.NUMBER_VALUE
  8  AND   T8_1.BIN_DBL_VALUE = :bin_dbl;

               NUMBER_VALUE               BIN_DBL_VALUE
--------------------------- ---------------------------
     0.33333333333333333333      0.33333333333333331000

SQL> 
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL, NULL, 'LAST'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  fcrg6ajn472kw, child number 0
-------------------------------------
SELECT   T8_1.NUMBER_VALUE,   T8_1.BIN_DBL_VALUE FROM   T8 T8_1 , T8 T8_2 WHERE
T8_1.BIN_DBL_VALUE = T8_2.NUMBER_VALUE AND   T8_1.BIN_DBL_VALUE = :bin_dbl

Plan hash value: 1988820304

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |       |       |    18 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T8          |     1 |    30 |     2   (0)| 00:00:01 |
|   2 |   NESTED LOOPS              |             |     1 |    52 |    18   (6)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL        | T8          |     1 |    22 |    16   (7)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN         | BIN_DBL_IDX |     1 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(TO_BINARY_DOUBLE("T8_2"."NUMBER_VALUE")=:BIN_DBL)
   4 - access("T8_1"."BIN_DBL_VALUE"=:BIN_DBL)
       filter("T8_1"."BIN_DBL_VALUE"=TO_BINARY_DOUBLE("T8_2"."NUMBER_VALUE"))

In this case it looks like Oracle has applied transitive closure (line 3 of Predicate Information) and with the side affect of implicit conversion a full table scan was done instead of an index range scan.

5 06 2010
Charles Hooper

Very well written. I think that you have just demonstrated the potential unintended side-effects that are caused by selecting the numeric datatype for data storage purposes without first giving consideration to how the data will be used/accessed.

The link that I provided for the “Troubleshooting Oracle Performance” book states that there are two advantages of the BINARY_DOUBLE (and BINARY_FLOAT) datatype compared to the NUMBER datatype:
1. The BINARY_DOUBLE (and BINARY_FLOAT) datatype implements the IEEE 754 standard so that a CPU can directly process the numbers without first being processed from Oracle’s internal libraries as is needed by the NUMBER datatype.
2. The BINARY_DOUBLE (and BINARY_FLOAT) datatype has a fixed length (always requires the same number of bytes).

Accurately describing how much more processing efficient the BINARY_DOUBLE (and BINARY_FLOAT) datatype is than the NUMBER datatype might not be possible – there are probably just too many variables (and too many number combinations). Additionally, whether the CPU uses big-endian or little-endian (http://en.wikipedia.org/wiki/Endianness) could also be important.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Follow

Get every new post delivered to your Inbox.

Join 140 other followers

%d bloggers like this: