Why Doesn’t this SQL Work?

2 03 2010

March 2, 2010

I read a lot of computer books – a fair number of those are on the topic of Oracle, and a portion of those are specific to writing SQL that executes on Oracle Database.  I also spend time browsing the Internet looking for interesting articles.  I found an interesting SQL statement in a couple of places on the Internet, so I thought that I would share the SQL statement:

SELECT
  BOOK_KEY
FROM
  BOOK
WHERE
  NOT EXISTS (SELECT BOOK_KEY FROM SALES);

The SQL statement can be found here:
http://books.google.com/books?id=xJ0fLjQFUFcC&pg=PA105#v=onepage&q=&f=false

And here (as well as a half-dozen other places on the Internet):
http://deepthinking99.wordpress.com/2009/11/20/rewriting-sql-for-faster-performance/

Deep thinking… something is wrong with that SQL statement.  Maybe we need a test script to see the problem?

CREATE TABLE T5 AS
SELECT
  ROWNUM BOOK_KEY
FROM
  DUAL
CONNECT BY
  LEVEL<=20;

CREATE TABLE T6 AS
SELECT
  ROWNUM*2 BOOK_KEY
FROM
  DUAL
CONNECT BY
  LEVEL<=20;

Let’s pretend that table T5 is the table BOOK and table T6 is the table SALES.  The SQL statement would look like this using our test tables:

SELECT
  BOOK_KEY
FROM
  T5
WHERE
  NOT EXISTS (SELECT BOOK_KEY FROM T6);

Both of the above links go on to suggest that a transformed and faster version of the above SQL statement would look like this:

SELECT
  B.BOOK_KEY
FROM
  T5 B,
  T6 S
WHERE
  B.BOOK_KEY=S.BOOK_KEY(+)
  AND S.BOOK_KEY IS NULL;

I suggest that both of the above links (and the 6+ other links found through a Google search) are clearly wrong – the first SQL statement is obviously faster.  Don’t believe me?  Put 1,000,000 rows in each table and time how long it takes to transfer the results back to the client computer.  How confident am I?  Take a look:

SELECT
  BOOK_KEY
FROM
  T5
WHERE
  NOT EXISTS (SELECT BOOK_KEY FROM T6);

no rows selected
--
SELECT
  B.BOOK_KEY
FROM
  T5 B,
  T6 S
WHERE
  B.BOOK_KEY=S.BOOK_KEY(+)
  AND S.BOOK_KEY IS NULL;

  BOOK_KEY
----------
         5
         3
        15
        19
        17
         7
         9
        13
         1
        11

So, if each table contained 1,000,000 rows, which SQL statement would return the result set to the client the fastest?

Lesson 1: if you plan to publish something, whether in book form or on the Internet, make certain that what you publish actually works (or at least looks like you put some effort into it).

Lesson 2: if you plan to copy someone else’s work and post it on your website/blog make certain that what you copy and pass off as your own actually works.

Lesson 3: don’t trust everything that you read on the Internet or in a book without first verifying that the information is correct, even if you find the information on your favorite website.

Makes you wonder if someone would suggest replacing a pure SQL solution with a combined SQL and PL/SQL solution for the purpose of improving performance.  No, that would be silly.  Pardon me while I go re-sequence the 64 bits to keep them from chattering as they pass through the router… maybe I should try to oil the bits or use a bigger router.  On second thought, I’ll just use a hammer (putting down the 28oz framing hammer to grab the small hammer, those bit break too easily).


Actions

Information

15 responses

2 03 2010
Laurent Schneider

indeed, nice query, not necessarly eye-popping if you do not test what you write!

2 03 2010
Charles Hooper

Laurent,

Thanks for stopping by. The mistake in the SQL statement is probably easy to make, and easy to identify as a problem with a simple test. Of course, if the code logic was used in a much more complicated SQL statement, it might be far more difficult to determine that the SQL statement is not providing the expected output, and that may lead to rather unfortunate data accuracy problems.

2 03 2010
Simon Fletcher

Of course it won’t work, there is no join between the inner and outer query.

There query should look like this:
SELECT
BOOK_KEY
FROM
T5
WHERE
NOT EXISTS (SELECT BOOK_KEY FROM T6 WHERE T5.BOOK_KEY = T6.BOOK_KEY);

2 03 2010
Charles Hooper

Simon,

Thanks for stopping by. Your fix is correct, however… now the original documents that recommend the modification to the SQL statement do not make sense, at least with any supported release of Oracle Database. I was hoping that someone would supply the corrected SQL statement, thanks.

A demonstration using my automated tool for generating DBMS_XPLAN output

The SQL statement from the web pages that actually worked as expected:

SQL_ID  9wygcmywqnakk, child number 0
-------------------------------------
SELECT B.BOOK_KEY  FROM T5 B , T6 S WHERE     B.BOOK_KEY=S.BOOK_KEY(+)    AND S.BOOK_KEY IS NULL
 
Plan hash value: 4037983709
 
----------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
|*  1 |  HASH JOIN ANTI    |      |      1 |      1 |     10 |00:00:00.01 |       6 |  1517K|  1517K| 1043K (0)|
|   2 |   TABLE ACCESS FULL| T5   |      1 |     20 |     20 |00:00:00.01 |       3 |       |       |          |
|   3 |   TABLE ACCESS FULL| T6   |      1 |     20 |     20 |00:00:00.01 |       3 |       |       |          |
----------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("B"."BOOK_KEY"="S"."BOOK_KEY")
 
Note
-----
   - dynamic sampling used for this statement

Your correctly fixed version of the SQL statement:

SQL_ID  0chf1vyy804cz, child number 0
-------------------------------------
SELECT BOOK_KEY  FROM T5 WHERE     NOT EXISTS (SELECT BOOK_KEY FROM T6 WHERE T5.BOOK_KEY = T6.BOOK_KEY)
 
Plan hash value: 4037983709
 
----------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
|*  1 |  HASH JOIN ANTI    |      |      1 |      1 |     10 |00:00:00.01 |       6 |  1517K|  1517K| 1111K (0)|
|   2 |   TABLE ACCESS FULL| T5   |      1 |     20 |     20 |00:00:00.01 |       3 |       |       |          |
|   3 |   TABLE ACCESS FULL| T6   |      1 |     20 |     20 |00:00:00.01 |       3 |       |       |          |
----------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T5"."BOOK_KEY"="T6"."BOOK_KEY")
 
Note
-----
   - dynamic sampling used for this statement

Still not obvious why the plans look very, very similar on Oracle 10.2.0.x? Let’s look at a portion of the 10053 trace file:

The SQL statement from the web pages that actually worked as expected:

******************************************
Current SQL statement for this session:
SELECT B.BOOK_KEY  FROM T5 B , T6 S WHERE  
  B.BOOK_KEY=S.BOOK_KEY(+) 
  AND S.BOOK_KEY IS NULL 
*******************************************
...
*********************************
(newjo-save)    [1 0 ]
Final - All Rows Plan:  Best join order: 1
  Cost: 4.5107  Degree: 1  Card: 1.0000  Bytes: 26
  Resc: 4.5107  Resc_io: 4.0000  Resc_cpu: 3241267
  Resp: 4.5107  Resp_io: 4.0000  Resc_cpu: 3241267
kkoipt: Query block SEL$6E71C6F6 (#0)
******* UNPARSED QUERY IS *******
SELECT "B"."BOOK_KEY" "BOOK_KEY" FROM "TESTUSER"."T5" "B","TESTUSER"."T6" "S" WHERE "B"."BOOK_KEY"="S"."BOOK_KEY"
kkoqbc-end
          : call(in-use=41424, alloc=49080), compile(in-use=43232, alloc=47000)
apadrv-end: call(in-use=41424, alloc=49080), compile(in-use=44176, alloc=47000)

Your correctly fixed version of the SQL statement:

******************************************
Current SQL statement for this session:
SELECT BOOK_KEY  FROM T5 WHERE  
  NOT EXISTS (SELECT BOOK_KEY FROM T6 WHERE T5.BOOK_KEY = T6.BOOK_KEY) 
*******************************************
...
*********************************
(newjo-save)    [0 1 ]
Final - All Rows Plan:  Best join order: 1
  Cost: 4.5107  Degree: 1  Card: 1.0000  Bytes: 26
  Resc: 4.5107  Resc_io: 4.0000  Resc_cpu: 3241267
  Resp: 4.5107  Resp_io: 4.0000  Resc_cpu: 3241267
kkoipt: Query block SEL$5DA710D3 (#1)
******* UNPARSED QUERY IS *******
SELECT "T5"."BOOK_KEY" "BOOK_KEY" FROM "TESTUSER"."T6" "T6","TESTUSER"."T5" "T5" WHERE "T5"."BOOK_KEY"="T6"."BOOK_KEY"
kkoqbc-end
          : call(in-use=42280, alloc=49080), compile(in-use=66776, alloc=105192)
apadrv-end: call(in-use=42280, alloc=49080), compile(in-use=67720, alloc=105192)

Those SQL statements directly below the “******* UNPARSED QUERY IS *******” lines from the 10053 trace files sure do look to be similar.

This SQL statement transformation might be a news-worthy development, especially for those who are still working from Oracle 8i SQL references.

3 03 2010
Simon Fletcher

Gone are the days when I could predict what the optimizer would do (Oracle 7.3 was the last release I think).

The original text is basically explaining that NOT EXISTS is preferred to NOT IN. Unfortunately the author forgets to put in the join to correlate the sub-query. He then goes on to explain that an outer join is often quicker than a NOT EXISTS. Although you’ve demonstrated current releases of the optimiser handle them the same.

Interesting that the author didn’t suggest a MINUS which in some circumstances can be even quicker.
e.g.
SELECT BOOK_KEY FROM T5
MINUS
SELECT BOOK_KEY FROM T6

2 03 2010
Niall Litchfield

Unfortunately the likely original author has a bit of a problem with publishing code that doesn’t do what it says. My personal favourite is the alert.log monitoring using external tables. Unusually for this author we get the table definition first complete with invalid name for an alert log, but hey I make typos as much as the next guy so I’ll forgive that.

create directory BDUMP as '/u01/app/oracle/admin/mysid/bdump';

create table 
   alert_log ( msg varchar2(80) )
organization external (
   type oracle_loader
   default directory BDUMP
   access parameters (
      records delimited by newline
   )
   location('alrt_mysid.log')
)
reject limit 1000;

No the real joy is in the output. See if you can spot what’s wrong.

select msg from alert_log where msg like '%ORA-00600%';

ORA-00600: internal error code, arguments: [17034], [2940981512], [0], [], [], [ ], [], [] 
ORA-00600: internal error code, arguments: [18095], [0xC0000000210D8BF8], [], [], [], [], []
ORA-00600: internal error code, arguments: [4400], [48], [], [], []
2 03 2010
Niall Litchfield

ah and this piece of code has also been published elsewhere, including appparently on expertsexchange (I can only see the google result I don’t have a subscription to that irritating site)

2 03 2010
Charles Hooper

Niall,

Thanks for stopping by. That, unfortunately, is the problem. When something is repeated often enough by enough people, there is a good chance that others will start believing that what is stated is true. If it had not been for Simon’s post in this thread, someone might be tempted to file a bug report on Metalink to report that Oracle is returning the wrong query results. 🙂

I do not know who originally created that piece of SQL code. A Google search finds a blog article dated October 10, 2008, another blog article dated July 27, 2009, a news article dated July 5, 2009, and a book with a copywrite date of 2004 – but the SQL statement might have been kicking around long before 2004.

2 03 2010
Charles Hooper

Niall,

That is a tough problem.

When you are counting on the character of the code to tell you if you have ERRORed, maybe you should be counting on using something else? At least that is what Word’s character counter told me.

2 03 2010
Maxim

Well, for given external table definition, at least 2 rows from the provided output should have been rejected – yet again bug report for Metalink ?

Best regards

Maxim

3 03 2010
Niall Litchfield

Hi Maxim

No need to get as far as that, you’ll find that you can’t insert those two rows into that table in that fashion. It’s rather difficult to avoid the conclusion that the output was most likely made up.

8 03 2010
joel garry

I followed the google books reference to Amazon, and boy were those reviews entertaining! Thanks Charles, et al.

By the by, if you could put date as well as time on the comments, that might be userful.

8 03 2010
Charles Hooper

Joel, thanks for stopping by.

I guess that I should have looked at the book reviews before quoting from that book (I have not read that book). On a positive note, the book did receive four 5 star reviews, so someone liked the book (maybe the owner of the DeepThinking site). I think that I recognize the names of a couple of the people who wrote some of those reviews.

The display of the dates is one of the few problems that I have with this WordPress theme – that is why I write the date on the first line of each article. I have no control over the formatting of the dates. The dates of the comments are actually at the right side of the comment:
8
03
2010

I guess that is supposed to mean March 8, 2010.

9 03 2010
Timur Akhmadeev

>I have no control over the formatting of the dates.
Charles, you can change date and time formats on the Settings->General page of the WordPress admin tool.

9 03 2010
Charles Hooper

Timur,

Thanks for the suggestion. I believe that those settings worked with the original WordPress theme that I used, but that WordPress theme did not handle wide code sections very well. When I switched to the current “Freshy” theme the settings on that page no longer controlled the format of the dates for the blog articles nor for the comments. Looking through the raw CSS stylesheet, I do not see how the date format was changed by the “Freshy” theme.

This is just a minor issue, so I am not too concerned about the date format.

Leave a reply to Timur Akhmadeev Cancel reply