June 6, 2010 (References Last Updated June 7, 2010)
Answering the Question – What can I do with SQL? A Handful of Errors Tarnish a Great Book
Last November I pre-ordered this book, hoping that it would pick up where the “Mastering Oracle SQL and SQL*Plus” book ended (at roughly Oracle Database release 10.1.0.3). The expectations were very high, and this book delivers for the most part. Some parts of the book left me scratching my head, thinking that this SQL statement or paragraph just cannot be correct. Even when taking those sections into account, this book conveys a significant amount of practical, useful information.
The book’s format of problem statement, solution, and how/why the solution works is the key to the success of this book. You will not find page after page of SQL statement syntax. The range of problem statements covered in this compact 500 page book is impressive, touching on a wide range of topics:
- Basic SELECT statements
- Manipulating numbers, dates, and character data
- Transaction management
- XML data
- Analytic functions
- Regular expressions
- Performance tuning
- Database troubleshooting
- Creating databases, tablespaces, tables, indexes, and constraints
- Thoroughly answers the question of why it is important to learn Oracle’s dialect of SQL. The book provides 233 problem situations, most with more than one variation of the problem along with the SQL statement for each variation.
- Includes interesting side stories that are not necessarily related to SQL coding, but help to keep the reader interested in the book contents. For example, a side story titled “Beer and Statistics Do Mix” that describes the invention of T-Tests.
- Typically states when a feature requires the Enterprise Edition or an additional cost license beyond the Enterprise Edition (Tuning Pack, for example).
- Most, if not all of the code samples from the book may be downloaded from the Apress website.
- Describes the LEAD, LAG, RANK, DENSE_RANK, ROW_NUMBER, SUM, MIN, MAX, NTILE and a couple of the other common analytic functions. Does not cover all of the analytic functions, for example: RATIO_TO_REPORT, FIRST_VALUE, LAST_VALUE, PERCENTILE_DISC, etc.
- Recipe 5-8 shows how to create a PL/SQL function that converts from base 10 to binary, octal, and hex.
- Recipe 11-9 shows an impressive example that creates a HTML chart from a SQL statement. It might be interesting to see this example extended to use floating objects with absolute positioning rather than HTML tables.
- Recipes 12-3 and 12-4 include PL/SQL functions that help to determine if a VARCHAR2 value can be safely converted to a NUMBER or DATE.
Problems (in order, not all code samples tested):
- In recipe 1-1, retrieving data from a table, the authors mentioned “currently set schema,” a concept which probably would have been much better to introduce in a later recipe. Especially in the early sections of the book, it seems that the authors struggled with what concepts to introduce early and what concepts required a more establish foundation. As such, people just learning SQL, or even just Oracle’s dialect, will struggle with the early sections of the book.
- Several recipes (1-4, 2-11, 2-14, 2-15, 2-16, 6-5, 6-14) use implicit date conversions, or explicit varchar to date conversions without specifying the format of the date contained in the varchar. This is a very poor practice to teach to people who are just beginning the process of learning SQL. A change in the NLS parameters, or even a change in the connection method from Oracle’s ODBC driver to Oracle’s OLEDB driver is sufficient to cause such SQL statements to fail. Unfortunately, the TO_DATE function was not introduced until chapter 6.
- Page 26 incorrectly stated, “for performance reasons, Oracle will implicitly sort the data to match the grouping desired.” As of Oracle Database 10.1 [edit: probably should state 10.2 for hash group by, previous releases did not necessarily need to perform a sort], more often than not a hashing algorithm will be used to perform GROUP BY operations, which means that there will be no implicit sort of the data. [Reference/Test Case] [Reference/Test Case]
- Recipe 3-5 attempted to explain when to use three different syntaxes to delete rows from a table (IN, EXISTS, and SELECT with USING clause). As may be confirmed with a 10053 trace file, the optimizer will often automatically re-write SQL statements from one form to another, for example from an IN clause to a standard join, or an EXISTS clause to a standard join. The authors probably should have mentioned the automatic query re-write capability of the cost-based optimizer, rather than stating something like this about an EXISTS clause example “this is not as elegant as the first solution, but might be more efficient.” [Reference/Test Case - Check the Comments Section Also]
- Recipe 5-11 demonstrates how to use the FOR UPDATE clause of a SELECT statement and demonstrates that the second session will be blocked until the first session commits or issues a rollback. At this stage of the book the NOWAIT clause is not introduced. Unfortunately, such a demonstration might be an invitation to “lost” updates.
- The query output for recipe 6-6 does not match the expected output, which may lead to confusion. The book shows that the result of the timezone conversion is “13-AUG-09 06.00.00.000000 PM AMERICA/LOS_ANGELES” when it should be “13-AUG-09 06.25.00.000000 PM AMERICA/LOS_ANGELES”
- Recipe 6-12, which uses the EXTRACT function, will produce incorrect/unexpected results, resulting in the MONTHS calculation increasing by one up to 15 days earlier than should be expected if used to determine the length of time an employee has worked. That recipe would have been the perfect time to introduce the MONTHS_BETWEEN SQL function (not included in the book). [Reference/Test Case - Check the Comments Section Also]
- Recipe 7-1 states that Oracle will never use a non-function based index on a column when that column appears inside the INSTR function that is present in a WHERE clause. That statement is incorrect, a hint, covered in recipe 19-12, is sufficient to allow Oracle to perform an index full scan operation on that column when the only predicate in the WHERE clause includes that index’s column wrapped in an INSTR function (tested on Oracle releases 10.2.0.4 through 220.127.116.11). [Reference/Test Case - Check the Comments Section Also]
- Recipe 8-2 states, “With the value one-third stored in each column (BINARY_DOUBLE and NUMBER datatypes), we can use the VSIZE function to show it was much more complicated to store this with decimal precision, taking nearly three times the space.” There are a couple of problems with this statement. First, a BINARY_DOUBLE column will always require eight bytes to store any number, while it takes a variable amount of space to store a number in a NUMBER column (for instance, the number 1,000,000 requires fewer bytes to store than the number 333). Second, the BINARY_DOUBLE column only maintained 16 significant digits to the right of the decimal point, while the NUMBER column maintained 40 significant digits to the right of the decimal point (the bytes required can be decreased by specifying a smaller number of significant digits to the right of the decimal point). [Reference/Test Case - Check the Comments Section Also]
- Recipe 8-6 states “The final two implicit defaults (of a CREATE SEQUENCE clause) cause Oracle to cache 20 values and not force strict first-come, first served ordering on those who call NEXTVAL.” This statement is either misleading or incorrect. The default will force the sequence numbers to be assigned to calling sessions in the sequential order specified by CREATE SEQUENCE – Oracle will not provide the numbers out of sequential order. There can be some degree of out of sequential order assignment in a RAC environment.
- Recipe 9-2, which includes a SQL statement that is intended to show the objects which are locked that are blocking other sessions, has a number of problems. 1) It is not RAC aware – it should be selecting from the GV$ views, rather than the V$ views. 2) The SQL statement explicitly looks for the BLOCK column to be equal to 1 – in a RAC environment it could potentially show a value of 2. 3) When performing a self-join on V$LOCK, the query is only joining on the ID1 column of V$LOCK, while the documentation shows a self-join on that view using the ID1, ID2, and TYPE columns. 4) There appears to be a missing join condition between V$LOCK and V$LOCKED_OBJECT that results in objects being identified as contributing to the blocking of the second session, when in fact that may not be the case. 5) The query seems to assume that all locks acquired are full table locks. [Reference/Test Case - Check the Comments Section Also]
- Page 436 suggests checking Metalink (My Oracle Support) Doc ID 555284.1 for a script to identify indexes that should be rebuilt. I suspect that that document looked very different prior to the last modification date of May 13, 2009. The script in the document does not detect indexes that are candidates for a rebuild – the script simply outputs SQL statements to rebuild every index that is not owned by SYS or SYSTEM (and a small number of other users) into an INDX tablespace with a specified initial, next, and maxextents clause.
- Recipe 18-13, in the Note section, states that when a primary key constraint is created Oracle will create a unique index with the same name as the constraint. That statement is true unless a unique or non-unique index already exists for the column or a composite index that has that column as its leading column exists. Later in the recipe it was stated that the DBA could pre-create the index that will be used to help enforce the primary key constraint, but the Note section might cause a little confusion. [Reference/Test Case - Check the Comments Section Also]
- Recipe 19-1, the join between V$SQL_MONITOR and V$SQL is incomplete and will result in repeated rows if there are multiple child cursors for a SQL_ID, which seems to happen frequently starting with Oracle Database 18.104.22.168 due to adaptive cursor sharing, and 22.214.171.124 due to cardinality feedback. Additionally, V$SQL_MONITOR will only be populated if the CONTROL_MANAGEMENT_PACK_ACCESS initialization parameter is set to DIAGNOSTIC+TUNING to enable the extra cost licenses (Tuning Pack, for example). [Reference/Test Case]
- Recipe 19-3 states “the V$SESSION_LONGOPS view gives you a rough idea of how long a currently running query has left to finish.” Technically, this statement is both correct and incorrect. The view provides a rough idea, based on the execution plan estimates, of how much time is left for a specific stage (step/line) in the execution plan. Entries in the V$SESSION_LONGOPS view will appear and disappear as the different stages in the execution plan execute. Most index type accesses will likely complete in less than six seconds per access, and probably will not appear in this view. [Reference/Test Case - Check the Comments Section Also]
- Recipe 19-4, the query may produce misleading results as the query is not time-scoped. The query does not consider that the database may have been up for several years – a query that on average consumes one second of CPU time, but was executed 100 million times since the library cache was last purged will likely be flagged as a problem, while a query that on average consumes 1,000 seconds of CPU time but has only executed 1,000 times in the last 24 hours probably will not appear in the SQL results.
- Recipe 19-7 states that a PLAN_TABLE must exist, and that if it does not exist the utlxplan.sql script should be run to create a PLAN_TABLE in the user’s schema. As of Oracle Database 10.1.0.1 [Edit: it appears that this change happened prior to 10.1.0.1] user schemas should no longer have a PLAN_TABLE in their schema – a global temporary table owned by the SYS user is used instead. [Reference/Test Case - Check the Comments Section Also]
- Chapter 19 demonstrates the use of EXPLAIN PLAN, AUTOTRACE, and the EXPLAIN option of TKPROF, but the chapter does not mention that using those methods for displaying the execution plan could show the wrong execution plan. [Reference/Test Case - Check the Comments Section Also]
- Recipes 19-10 and 19-12 suggest that a goal of tuning is reducing the calculated COST value that appears in the execution plan, and this was accomplished in the recipes by adding an index or using an index hint. This is not a good approach to tuning; it is much better to rely on actual response time rather than the calculated cost estimates. [Reference/Test Case - Check the Comments Section Also] [Reference/Test Case - Check the Comments Section Also]
- Recipe 20-1, the SQL statements are not time-scoped, and even if the database has only been open for a short period of time, the SQL statement could yield incorrect results because the “CPU used by this session” statistic is not updated in real-time, only when a SQL statement completes, or a fetch completes. [Reference/Test Case - Check the Comments Section Also]
- Page 500 suggests inspecting V$SESSION to determine the type of block contention that causes buffer busy waits. It would have been nice to see the authors provide a demonstration of this technique because from what I am able to determine, the chances of a success are extremely small using V$SESSION (I assume that the authors might be suggesting to look at either the combination of P1, P2, and P3 or the combination of ROW_WAIT_OBJ#, ROW_WAIT_FILE#, and ROW_WAIT_BLOCK#).
- Page 500 states that the “SQL*Net message from client” wait event indicates possible network or middle-tier bottlenecks. Client-side “think time” is probably the most significant contributor to this wait event, and I think that it would be very difficult to demonstrate that this wait event indicates network bottlenecks. [Reference/Test Case] [Reference/Test Case] [Reference/Test Case - Check the Comments Section Also]
- Recipe 20-8 suggests an approach to identify those indexes that are safe to drop by enabling monitoring of the indexes. One of the problems with this approach is that the index may still be used by the cost-based optimizer even when monitoring reveals that the index is never used. [Reference/Test Case - Check the Comments Section Also] [Reference/Test Case - Check the Comments Section Also]
- Very few cross-recipe (forward and backward) references were provided, which makes the sections of the book appear to be disconnected and stand-alone.
The book attempts to cover a lot of ground. While it probably requires the authors between one and eight hours per page to research, write, proof-read, test, and format a typical page in an Apress book, the authors probably could have controlled the scope of the book a little better and spent a little more time to test and expand some of the sections of the book – the sections that discuss analytic functions, for example. I did not spend a lot of time reading through the chapters “Working with XML Data”, “Partitioning”, or “LOBS”, so I did not comment on those sections of the book. The majority of the book is exceptional – far better than any book that is strictly a SQL reference type book, and I would not hesitate to recommend the book to others who are trying to figure out: “What do I do with SQL.”
I will probably update this review at a later time to supply links to test cases or articles for most of the problems that were identified with the book. I might also take a closer look at the “Working with XML Data”, “Partitioning”, or “LOBS” chapters, and provide feedback for those sections of the book.
(Section added December 13, 2010): Related blog articles that discuss the contents of this book: