Why Doesn’t this SQL Work?

2 03 2010

March 2, 2010

I read a lot of computer books – a fair number of those are on the topic of Oracle, and a portion of those are specific to writing SQL that executes on Oracle Database.  I also spend time browsing the Internet looking for interesting articles.  I found an interesting SQL statement in a couple of places on the Internet, so I thought that I would share the SQL statement:

SELECT
  BOOK_KEY
FROM
  BOOK
WHERE
  NOT EXISTS (SELECT BOOK_KEY FROM SALES);

The SQL statement can be found here:
http://books.google.com/books?id=xJ0fLjQFUFcC&pg=PA105#v=onepage&q=&f=false

And here (as well as a half-dozen other places on the Internet):
http://deepthinking99.wordpress.com/2009/11/20/rewriting-sql-for-faster-performance/

Deep thinking… something is wrong with that SQL statement.  Maybe we need a test script to see the problem?

CREATE TABLE T5 AS
SELECT
  ROWNUM BOOK_KEY
FROM
  DUAL
CONNECT BY
  LEVEL<=20;

CREATE TABLE T6 AS
SELECT
  ROWNUM*2 BOOK_KEY
FROM
  DUAL
CONNECT BY
  LEVEL<=20;

Let’s pretend that table T5 is the table BOOK and table T6 is the table SALES.  The SQL statement would look like this using our test tables:

SELECT
  BOOK_KEY
FROM
  T5
WHERE
  NOT EXISTS (SELECT BOOK_KEY FROM T6);

Both of the above links go on to suggest that a transformed and faster version of the above SQL statement would look like this:

SELECT
  B.BOOK_KEY
FROM
  T5 B,
  T6 S
WHERE
  B.BOOK_KEY=S.BOOK_KEY(+)
  AND S.BOOK_KEY IS NULL;

I suggest that both of the above links (and the 6+ other links found through a Google search) are clearly wrong – the first SQL statement is obviously faster.  Don’t believe me?  Put 1,000,000 rows in each table and time how long it takes to transfer the results back to the client computer.  How confident am I?  Take a look:

SELECT
  BOOK_KEY
FROM
  T5
WHERE
  NOT EXISTS (SELECT BOOK_KEY FROM T6);

no rows selected
--
SELECT
  B.BOOK_KEY
FROM
  T5 B,
  T6 S
WHERE
  B.BOOK_KEY=S.BOOK_KEY(+)
  AND S.BOOK_KEY IS NULL;

  BOOK_KEY
----------
         5
         3
        15
        19
        17
         7
         9
        13
         1
        11

So, if each table contained 1,000,000 rows, which SQL statement would return the result set to the client the fastest?

Lesson 1: if you plan to publish something, whether in book form or on the Internet, make certain that what you publish actually works (or at least looks like you put some effort into it).

Lesson 2: if you plan to copy someone else’s work and post it on your website/blog make certain that what you copy and pass off as your own actually works.

Lesson 3: don’t trust everything that you read on the Internet or in a book without first verifying that the information is correct, even if you find the information on your favorite website.

Makes you wonder if someone would suggest replacing a pure SQL solution with a combined SQL and PL/SQL solution for the purpose of improving performance.  No, that would be silly.  Pardon me while I go re-sequence the 64 bits to keep them from chattering as they pass through the router… maybe I should try to oil the bits or use a bigger router.  On second thought, I’ll just use a hammer (putting down the 28oz framing hammer to grab the small hammer, those bit break too easily).