June 4, 2010
I encountered an interesting test case in the “Oracle SQL Recipes” book, but I fear that my degree in mathematics is causing me to fail to fully comprehend the test case. I developed a parallel test case that possibly answers the questions that are left unanswered. Here is my test case:
CREATE TABLE T8 ( NUMBER_DIV NUMBER, BIN_DBL_DIV BINARY_DOUBLE, NUMBER_VALUE NUMBER, BIN_DBL_VALUE BINARY_DOUBLE, NUMBER_VALUE2 NUMBER(7,2)); INSERT INTO T8 SELECT ROWNUM, ROWNUM, 1000/ROWNUM, 1000/ROWNUM, 1000/ROWNUM FROM DUAL CONNECT BY LEVEL<=10000; COMMIT; COLUMN NUMBER_DIV FORMAT 99999 COLUMN BIN_DBL_DIV FORMAT 99999 COLUMN NUMBER_VALUE FORMAT 99990.00000000000000000000 COLUMN BIN_DBL_VALUE FORMAT 99990.00000000000000000000 COLUMN NUMBER_VALUE2 FORMAT 99990.000 COLUMN VND FORMAT 999 COLUMN VBDD FORMAT 9999 COLUMN VNV FORMAT 999 COLUMN VBDV FORMAT 9999 COLUMN VNV2 FORMAT 9999 SET LINESIZE 140 SET TRIMSPOOL ON SET PAGESIZE 1000 SPOOL NUMBERTEST.TXT SELECT NUMBER_DIV, BIN_DBL_DIV, NUMBER_VALUE, BIN_DBL_VALUE, NUMBER_VALUE2, VSIZE(NUMBER_DIV) VND, VSIZE(BIN_DBL_DIV) VBDD, VSIZE(NUMBER_VALUE) VNV, VSIZE(BIN_DBL_VALUE) VBDV, VSIZE(NUMBER_VALUE2) VNV2 FROM T8 ORDER BY NUMBER_DIV; SPOOL OFF
Quoting from page 190 of the book:
With the value one-third stored in each column, we can use the VSIZE function to show it was much more complicated to store this [the value of 1/3] with decimal precision [using the NUMBER datatype], taking nearly three times the space [when compared to the BINARY_DOUBLE datatype].
Here is the output from my script for the row containing the value one-third:
NUMBER_DIV BIN_DBL_DIV NUMBER_VALUE BIN_DBL_VALUE NUMBER_VALUE2 VND VBDD VNV VBDV VNV2 ---------- ----------- ----------------------- ----------------------- ------------- ---- ----- ---- ----- ----- 3000 3000 0.3333333333333333 0.3333333333333333 0.330 2 8 21 8 2
As the book states, the column with the NUMBER datatype requires 21 bytes, while the column with the BINARY_DOUBLE datatype requires just 8 bytes to store the value one-third in the table. What, if anything, is wrong with the comparison?
Hint: To conserve space, the column format for the NUMBER_VALUE and BIN_DBL_VALUE columns in the above output was changed from:
99990.00000000000000000000 to: 99990.0000000000000000
There is an interesting description of the NUMBER and BINARY_DOUBLE (or other similar datatypes) datatypes in the book “Troubleshooting Oracle Performance“.
“As the book states, the column with the NUMBER datatype requires 21 bytes, while the column with the BINARY_DOUBLE datatype requires just 8 bytes to store the value one-third in the table. What, if anything, is wrong with the comparison?”
I would say that there are possibly two things wrong with this comparison.
1. The book has chosen a single value to compare sizes instead of looking at the range of values. Oracle’s NUMBER datatype is of variable length. Based on the test case presented the NUMBER_VALUE column in some cases will use as little as 2 bytes while the BINARY_DOUBLE remains fixed at 8 bytes. Depending on your needs the NUMBER datatype could be more efficient then a BINARY_DOUBLE.
2. I don’t think it’s fair to compare a NUMBER to a BINARY_DOUBLE. The default NUMBER has much more precision then anything a BINARY_DOUBLE can provide. The cost of that extra precision is more storage as shown in the test case. A 64-bit double has approximately 16 digits of precision as shown when using an unmodified column format:
Depending on the requirements, and precision needed, one could say that NUMBER_VALUE does not equal BIN_DBL_VALUE so what value is there in comparing the storage taken up by each?
I think a more appropriate comparison in this case since the maximum value is 1000 would be a BINARY_DOUBLE to a NUMBER(20,16). This will give a very similar precision to that of a BINARY_DOUBLE. When you compare the size of a NUMBER(20,16) to a BINARY_DOUBLE in this case there is only a one byte difference. As shown:
In this case NUMBER_VALUE2 was changed to NUMBER(20,16)
Centinul,
I think that you just demonstrated one of the advantages of this type of blog article.
The hint about changing the column format for the NUMBER_VALUE and BIN_DBL_VALUE columns to save space was true, but you pinpointed the reason why I selected to to strip exactly 4 decimal positions from the column format – there are supposed to be an infinite number of “3” digits after the decimal point, and as your comment shows the next digit is a “1” digit for the BINARY_DOUBLE column. As you stated, we reached the limit of the 8 byte (64 bit) storage capacity (precision/scale) for the column. The column with the NUMBER datatype stored even more “3” digits:
Great idea to change the NUMBER_VALUE2 column to a NUMBER(20,16) datatype – I did not consider that approach in part because I did not think that the storage requirement would drop below roughly 21 bytes (I was wrong, obviously).
Curious now, how was Oracle able to store that many digits in the NUMBER(20,16) column in just 9 bytes?
The output, slightly cleaned up for comparison:
Now the next question is: What would be a valid comparison of the NUMBER and BINARY_DOUBLE datatypes? Would we need to consider the possibility that the BINARY_DOUBLE datatype column might be joined to or compared to a NUMBER column in another table? Would we need to consider what would happen when the BINARY_DOUBLE datatype column appears in the WHERE clause and a bind variable with a NUMBER datatype was on the other side of the equal sign? What if the BINARY_DOUBLE datatype column was indexed?
“Would we need to consider the possibility that the BINARY_DOUBLE datatype column might be joined to or compared to a NUMBER column in another table?”
Yes, see below.
“Would we need to consider what would happen when the BINARY_DOUBLE datatype column appears in the WHERE clause and a bind variable with a NUMBER datatype was on the other side of the equal sign?”
Yes, see below.
What if the BINARY_DOUBLE datatype column was indexed?
Generally speaking, NUMBER to BINARY_DOUBLE results in the same implicit conversion issues that characters to numbers do as mentioned in your previous blog article “True or False – Why Isn’t My Index Getting Used?”
From the SQL Reference (http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/sql_elements001.htm#i156865):
“If any of the operands is BINARY_DOUBLE, then Oracle attempts to convert all the operands implicitly to BINARY_DOUBLE before performing the operation.”
I did a quick and dirty test. Here is the setup.
Version:
Indexes / Bind Variables:
Query #1: Comparing NUMBER_VALUE column to BINARY_DOUBLE bind variable
As suspected Oracle converted the NUMBER_VALUE to a BINARY_DOUBLE, therefore avoiding indexed access
Query #2 Comparing BIN_DBL_VALUE to a NUMBER bind variable
Same conversion applies but this time not to the column, allowing for indexed access.
Query #3 Joining tables together on a BINARY_DOUBLE to NUMBER join condition.
In this case it looks like Oracle has applied transitive closure (line 3 of Predicate Information) and with the side affect of implicit conversion a full table scan was done instead of an index range scan.
Very well written. I think that you have just demonstrated the potential unintended side-effects that are caused by selecting the numeric datatype for data storage purposes without first giving consideration to how the data will be used/accessed.
The link that I provided for the “Troubleshooting Oracle Performance” book states that there are two advantages of the BINARY_DOUBLE (and BINARY_FLOAT) datatype compared to the NUMBER datatype:
1. The BINARY_DOUBLE (and BINARY_FLOAT) datatype implements the IEEE 754 standard so that a CPU can directly process the numbers without first being processed from Oracle’s internal libraries as is needed by the NUMBER datatype.
2. The BINARY_DOUBLE (and BINARY_FLOAT) datatype has a fixed length (always requires the same number of bytes).
Accurately describing how much more processing efficient the BINARY_DOUBLE (and BINARY_FLOAT) datatype is than the NUMBER datatype might not be possible – there are probably just too many variables (and too many number combinations). Additionally, whether the CPU uses big-endian or little-endian (http://en.wikipedia.org/wiki/Endianness) could also be important.