Book Review: Pro Oracle SQL

17 01 2011

January 17, 2011

Hard-Hitting, Beyond the Basics Advice for Breaking the Black Box Approach to Database Programming, Leveraging the SQL Language on Oracle Database, and Improving Overall Performance of the SQL that You Write

I pre-ordered this book in October 2010 while searching for a way to learn some of the more advanced features of Oracle Database’s SQL; I have been searching for a follow up book to the “Mastering Oracle SQL and SQL*Plus” book that I read a couple of years ago, and I think that I finally found that follow up book.  Written as the Oracle Database 11.2 sequel to the book “Mastering Oracle SQL and SQL*Plus” (and that book’s much updated revision titled “Beginning Oracle SQL”), this book is not written as a simple reference for the Oracle specific SQL dialect.  This is not a book that you will want to use when trying to learn the basics of creating a simple SQL statement with a four table join.  However, if you are able to create that simple SQL statement with a four table join in less than 30 minutes with the help of an entity-relationship diagram (ERD), and without using a DISTINCT clause, this book’s contents will prove to be a valuable resource not only to expand your depth of knowledge of the SQL language, but also to extract the extra value that becomes visible when Oracle Database is viewed as much more than a black box dumping ground for data. 

The authors of this book are all OakTable Network members and have between 15 and 29 years of experiencing working with Oracle products.  This long term exposure to Oracle Database is quite clearly an advantage when discussing many of the subtopics that are simply missed by other Oracle specific SQL language references.  While there was no discussion of the SQL language from Oracle Database 2.0, the authors are clearly comfortable with the features in Oracle Database 11.2 as well as the features that are available in Oracle Database 8i, 9i,10g, and 11g R1.

The book’s contents are well organized.  While each chapter identifies the author who wrote the chapter (possibly indicating that the authors did not work together on each individual chapter), the book flows well with plenty of forward and backward references between chapters, as well as including references to other resources (Metalink, books, and blogs).  Chapter one of the book is a bit different from the rest of the chapters in the book, and appears to be written as a transition area for readers to become familiar with SQL*Plus and Oracle Database.  Chapter one will be valuable to readers attempting to adjust from using graphical query tools (TOAD, SQL Developer, Excel, or any number of other graphical tools) to using SQL*Plus.  Additionally, that chapter helps readers who are familiar with other SQL dialects (such as that used by SQL Server) take advantage of Oracle Database’s special characteristics, and introduces readers to multi-table INSERTs and MERGE operations.  Understanding the information presented in execution plans is an emphasis in the remaining chapters of the book; this book’s explanation of execution plan content ranks among the best, if not the best, that I have seen to date.  While there is not a single cartoon drawing in the book, and it does not appear that any sections of this book were borrowed from other books, there are several very well placed diagrams in the book’s chapters.  The book makes a significant effort to control the scope of the material presented.  In most cases, that effort resulted in a very easy to understand, yet thorough discussions of complex topics while building bridges to help the reader transition into Oracle performance specific books, such as the book “Troubleshooting Oracle Performance”.  In a couple of areas, adding an additional half-dozen words might have saved the reader a little confusion, but those are rare occurrences in this book.  Be certain to install Oracle Database’s sample schema, and download the script library for this book from the Apress website (some of the scripts, especially those in chapter 16, are quite useful and may not be printed directly in the book).

Foundation knowledge, and miscellaneous comments while reading the book:

  • Pages 3-8: Provides a quick demonstration of how to connect to the database using SQL*Plus, which is helpful for people who have primarily used other query tools, or other database platforms.  Also demonstrates the basics of executing SQL statements, setting up the SQL*Plus environment, and executing scripts.
  • The book briefly touches on why it is important to use consistent formatting and bind variables in order to reduce the number of hard parses, and why writing SQL to limit logical IO is also important.
  • Page 40: Automatic query transformation often takes place, for instance, converting an IN subquery into a standard join.
  • Chapter 2 tries to break open the black box approach to writing SQL statements.
  • The book makes effective use of analogies.
  • Page 59: The book demonstrates that the array fetch size has an impact on the number of consistent gets performed when executing a query.
  • Pages 64-65: Includes a test case that shows why an index would be used in one case to retrieve 1% of the table rows, while in another case a full table scan was more appropriate to retrieve 1% of the rows.
  • Page 66: Nice summary of what controls the number of blocks that are read in a single read call during a full table scan or a fast full index scan.  There is one potentially confusing sentence, “This could mean that a multiblock read might only read one block at a time.”  The concepts were correctly stated, however it might be worthwhile to state that “a multiblock read might be truncated to a single block read due to the blocks that are already in the buffer cache.”
  • Pages 74-75: Describes how B*tree indexes grow from a single block in size, and what triggers the index height to increase.
  • Long code sections in the early chapters with embedded comments – probably OK for the intended audience.
  • Describes and lists when the various join methods (NESTED LOOPS, SORT-MERGE, HASH, CARTESIAN) are appropriate.
  • Pages 120-127: Explains how NULL values behave in potentially unexpected ways in IN, NOT IN, UNION, UNION ALL, INTERSECT, MINUS, GROUP BY, ORDER BY, COUNT, SUM, AVG, MIN, and MAX.
  • The book demonstrates through several examples that there is more than one way to build a SQL statement to answer a specific question, but not all methods are equally efficient.
  • Page 153: States that EXPLAIN PLAN only shows the estimated plan, while the actual plan may be different.
  • Page 160: Provides a demonstration that shows EXPLAIN PLAN sometimes shows the wrong execution plan.
  • Page 171: Provides a warning to SET SERVEROUTPUT OFF before displaying the execution plan using DBMS_XPLAN.DISPLAY_CURSOR.
  • Page 175: Describes all of the format parameters for DBMS_XPLAN, including the undocumented ADVANCED parameter.
  • Page 187: Example of collecting statistics on a table and its indexes without creating histograms.
  • Page 191: States that using an account with DBA privileges simplifies the process of using DBMS_XPLAN.DISPLAY_CURSOR, using data from other accounts, and selecting from system views, but advises to do so only in a test environment.
  • The book typically indicates the first Oracle release version that supports a feature that is being described.
  • Page 218: Includes a test case that seems to demonstrate an error in the Oracle Database 11.2 documentation.
  • Pages 223-224: Very helpful brief summary of most of the analytic functions that are discussed in the chapter – this list reduces the frustration in finding the correct analytic function that is needed to produce a desired result.
  • Page 225: Demonstrates how to create a running sum that automatically resets when the value of a column changes.
  • Page 244: Presence of the keywords WINDOW SORT in an execution plan indicates that the SQL statement uses an analytic function.
  • The KEEP keyword is not mentioned in the chapter that describes analytic functions, however an example is provided on page 173 of the book in the pln.sql script (in the script download for chapters 5 and 15).
  • Page 283: WITH clause is known as subquery factoring, and other database platforms (and the ANSI standard) refer to subquery factoring as common table expression.
  • Page 286: Oracle may process a WITH block as either an inline view or as a temporary table.
  • Book describes several methods for optimizing query performance, and suggests revisiting queries created in older release versions of Oracle Database to take advantage of performance optimizations that are available in more recent release versions.  Also suggests revisiting PL/SQL code to determine if such procedural code may be accomplished with set-based plain SQL code.  Emphasizes testing for performance, rather than just using the first query that seems to produce the expected result.
  • Describes SQL features through version 11.2 (as does the book “Beginning Oracle SQL”).
  • Page 309: Table 10-1 quickly describes the various functions, operators, and pseudo columns that are related to the Oracle CONNECT BY syntax, including SYS_CONNECT_BY_PATH, CONNECT_BY_ROOT, CONNECT_BY_ISCYCLE, CONNECT_BY_ISLEAF, PRIOR, LEVEL, and  NOCYCLE.
  • Page 331-334: Provides various demonstrations of common mistakes with queries containing EXISTS clauses.
  • Page 336-339: Demonstration that EXISTS and IN queries may be automatically rewritten into equivalent SQL statements and suggests looking in a 10053 trace file to see the actions performed by the optimizer.
  • Page 349: NOT IN and NOT EXISTS may return different results, and are not functionally equivalent due to the different handling of NULL values.
  • Page 355: Prior to 11g, anti-joins could not be performed on NOT IN queries unless the optimizer was sure that NULL values could not be returned.
  • Page 378: Shows how to index NULL values in a B*Tree index by using a constant as the second column in the index.
  • Page 388: Using DBMS_ROWID.ROWID_OBJECT(ROWID) to show how table rows are distributed with hash partitioning.
  • Page 388: Using ORA_HASH( column_name, 31, 0) to predict the partition into which a row will be placed in a hash partitioning scheme with 32 partitions.
  • Page 392: Function based indexes add a virtual column to the table, and those columns are viewable through the DBA_TAB_COLS view.
  • Page 393: Shows how to create a virtual column that always returns a specified calculated value.
  • Page 397: Starting in Oracle Database 11g it is possible to alter an index to make it invisible – this can be used to reduce the risks associate with just dropping the index.
  • Page 403: Direct path inserts are invoked with the APPEND hint, causing rows to be inserted above the high water mark.
  • Page 408: DBMS_ERRLOG.CREATE_ERROR_LOG procedure creates a logging table that is the destination for rows that fail during an INSERT, UPDATE, or DELETE when the LOG ERRORS INTO clause is included in the INSERT, UPDATE, or DELETE.
  • Pages 418-421: Demonstrates with a test case that when a large percentage of rows in a table need to be updated, creating a new table and using INSERT APPEND may be much faster than a typical INSERT statement if a little down time is acceptable during the creation of the new table.
  • Page 434: Definition of ACID (Atomicity, Consistency, Isolation, Durability)
  • Page 436: ANSI isolation levels explained.
  • Page 482: Demonstrates one of the potential problems associated with using statically defined views in a query, where a query is later extended to return additional information by joining directly to one of the base tables specified in the view.
  • Page 498: Oracle 10g and 11g use rolling invalidation of cursors when statistics are collected on the objects referenced by the cursors.
  • Chapter 15: Instrumenting code, comparing execution plan pre-change and post-change to determine the performance impacts of changes, regression testing of code changes, identifying methods to cause applications to fail.
  • Page 501: Mentions V$SQL_OPTIMIZER_ENV and the underlying structure X$KQLFSQCE.
  • Page 505: Very good explanation of adaptive cursor sharing.
  • Page 510: Identifying SQL statements with large swings in execution time.
  • Page 514: Bind variables are not appropriate in all situations.
  • Page 514: “Hints are actually directives to the optimizer. As long as the hint is valid, the optimizer will obey it.”
  • Page 516: Placing the word COMMENT in front of a set of hints in a SQL statement prevents the optimizer from using the hints that follow.
  • Page 517: Script to extract the OTHER_XML column from V$SQL.
  • Page 527: Example of creating a stored outline from a SQL statement in the library cache.
  • Page 538: SQL Profiles do not lock an execution plan in place – instead they lock a cardinality estimate adjustment in place through the use of embedded hints.  It is also possible to embed specific hints into SQL statements by creating a SQL profile using the undocumented DBMS_SQLTUNE.IMPORT_SQL_PROFILE function.

Execution Plan Operations Described (after chapter 6 “SQL Execution Plans”):

  • FILTER (page 195)
  • GENERATE CUBE (page 198)
  • WINDOW SORT (page 244)
  • SQL MODEL ACYCLIC (page 272)
  • SQL MODEL CYCLIC  (page 273)
  • SQL MODEL ORDERED (page 274)
  • COUNT STOP KEY (page 385)

 Analytic and Other Advanced SQL Constructs Described:

  • CUBE function (page 197)
  • GROUPING (page 209)
  • GROUPING_ID (page 212)
  • SUM analytic function (page 225)
  • MAX analytic function (page 225)
  • LAG analytic function (page 227)
  • LEAD analytic function (page 229)
  • FIRST_VALUE analytic function (page 231)
  • LAST_VALUE analytic function (page 231)
  • NTH_VALUE analytic function (page 232 – new in Oracle Database 11.2)
  • RANK analytic function (page 234)
  • DENSE_RANK analytic function (page 235)
  • ROW_NUMBER analytic function (page 236)
  • RATIO_TO_REPORT analytic function (page 237)
  • PERCENT_RANK analytic function (page 238)
  • PERCENTILE_CONT analytic function (page 238)
  • PERCENTILE_DISC analytic function (page 240)
  • NTILE analytic function (page 241)
  • STDDEV analytic function (page 242)
  • LISTAGG analytic function (page 243 – new in Oracle Database 11.2)
  • MODEL clause (page 253)
  • PIVOT clause (page 284)
  • SYS_CONNECT_BY_PATH (page 311)
  • CONNECT_BY_ROOT operator (page 313)
  • NOCYCLE parameter (page 316)
  • CONNECT_BY_ISCYCLE pseudo column (page 316)
  • CONNECT_BY_ISLEAF pseudo column (page 319)

Hints Described:

  • NO_MERGE (page 42)
  • MERGE (page 43)
  • NO_PUSH_PRED (page 49)
  • REWRITE (page 51)
  • FULL (page 86)
  • ORDERED (page 91)
  • USE_NL (page 91)
  • PARALLEL (page 277)
  • INLINE (page 286)
  • MATERIALIZE (page 286)
  • SEMIJOIN (page 339)
  • NO_SEMIJOIN (page 339)
  • ANTIJOIN (page 363)
  • USE_ANTI (page 363)
  • NL_AJ (page 363)
  • HASH_AJ (page 365)
  • INDEX (pages 374, 551)
  • APPEND (page 403)
  • APPEND_VALUES (page 403 – new in Oracle Database 11.2)
  • QB_NAME (page 516)
  • OPT_ESTIMATE (page 538)

Parameters Described:

  • _UNNEST_SUBQUERY (page 47)
  • _ALWAYS_SEMI_JOIN (page 342)
  • USE_STORED_OUTLINES (page 529)
  • SQLTUNE_CATEGORY (page 539)

Miscellaneous Useful Functions:

  • ORA_HASH function (pages 95, 388 – used to predict the partition into which a row will be placed in a hash partitioning scheme)
  • DBMS_RANDOM.VALUE (page 473)

While the “Pro Oracle SQL” book is an overall excellent book, clearly demonstrating that there was a great deal of care put into the book, there are a couple of small problem areas in the book (note that the authors appear to have addressed most of these issues on the errata page for the book on the Apress website):

  • While not significantly affecting the comprehension of the material presented in the book, there are a couple of typos in the book.  For instance, on page 9 the book states “In this book” rather than “In this chapter”; page 31 of the book uses the word “effect” rather than “affect”; and page 221 of the book demonstrates that simple connecting words were sometimes lost, “… would require multiple self-joins to [the] employees table.”  None of these typos change the intended message of the paragraphs.  However, the writing style in some of the chapters is significantly more fluid than in other chapters.
  • Pages 51-52: When the optimizer did not select to automatically use the materialized view, the author used a REWRITE hint to force the optimizer to use that materialized view.  It probably would have been a good idea to mention that the optimizer likely did not automatically use the materialized view because the calculated cost for that execution plan (1935) exceeded the cost for the execution plan that directly accessed the tables (485).
  • Page 73: States, “Block accesses made via an index scan are made using single-block reads.”  Technically, index range scans may employ multi-block reads in certain cases (such as index pre-fetching), but stating that at this point in the book might lead to unnecessary confusion.
  • Page 85: A missing word might lead to misunderstanding: “However, in the cases where the number of subindexes needed would be smaller, the operation can be many times more efficient than a full scan as scanning [a] smaller [number of] index blocks can be more efficient than scanning [a] larger [number of] table blocks.”
  • Page 94: Missing words in an explanation regarding how hash joins work might lead to misunderstanding: “Based on table and index statistics, the table that is determined to return the fewest rows will be hashed in its entirety into memory. This hash table includes all the row data for that table…”  Based on testing, it appears that the hash table only includes the selected columns and the columns that are joined to other row sources.  Additionally, the row source that is hashed into the hash table might not be from a table.  Including this additional detail might be too much information for the scope of the book (reference).
  • Page 102: There is a risk that the full outer join Oracle syntax equivalent implementation, when the approach is applied to other data, could yield different results from the ANSI full outer join if each row returned is not unique.  One way to work around that limitation is to replace the UNION with a UNION ALL and add AND E2.ROWID IS NULL to the final WHERE clause.
  • Page 163: States, “The operation [in the execution plan] that is most indented is actually the first operation that will be executed. If there are multiple operations at the same level, the operations are executed in a top-down order.”  The Oracle Database Performance Tuning Guide for 11.2 from the Oracle documentation library states essentially the same fact, and is also incorrect.  This guideline is correct in some cases, but incorrect in other cases as can be confirmed with a 10046 extended SQL trace.  The first operation executed in an execution plan is actually the first operation from the top of the execution plan that has no child operations (reference reference2).
  • Page 379: States, “B-tree indexes are suitable for columns with lower selectivity. If the columns are not selective enough, the index range scan will be slower. Further, less selective columns will retrieve numerous rowids from the leaf blocks leading to excessive single block access to the table.”  Based on the second and third quoted sentences, it appears that the author intended to state that “B-tree indexes are GENERALLY NOT suitable for columns with lower selectivity.”
  • Page 397: States, “There is another use case for the invisible indexes. These indexes are useful to reduce the risk while dropping unused indexes… From Oracle Database version 11g onwards, you can mark the index as invisible, wait for few weeks, and then drop the index if no process is affected with less risk.”  It is important that the book states “less risk” and not that there is no risk – it would have been helpful if the book discussed what risks remain (but again this might exceed the intended scope of the book).  Just because an index is not used for a couple of weeks does not mean that the index will not be used during month-end closing, year-end processing, or some other infrequently occurring activity.  Invisible indexes on foreign key columns may still be used to prevent table locking problems on the child table when the parent’s primary key columns are updated.  Additionally, the statistics from the invisible indexes may be used by the optimizer to more accurately determine cardinality estimates (reference reference2).
  • Page 437: The third paragraph contains an apparent typo regarding the location of the online redo logs: “Undo blocks retain the before condition of the data, while the redo information is stored in the online redo logs in the system global area (SGA).”
  • Page 484: Specifies 10044 tracing when 10046 tracing was likely intended.  The third paragraph states, “I’ve also added code to enable 10044 tracing…”  Event 10044 appears to enable tracing of free list undo operations – I believe that the author intended to write, “I’ve also added code to enable 10046 tracing…”
  • Page 529: The outline_startup_trigger.sql script is not included in the script library for the book.
  • Page 539: The create_tuning_task.sql and accept_sql_profile.sql scripts are not included in the script library for the book.

As indicated by the above, most of the problem areas are related to unintentional word substitutions.  The majority of the other problem areas are cases where the authors had to draw the line of discussion at a selected level of detail in order to limit potential confusion and control the scope of the book.  The remaining problem areas are minor in the overall context of the book, and might elicit a response from the authors along the lines of “I thought that I modified that phrase in draft 20 of the chapter”.  A fantastic Oracle SQL book that ventures well beyond simple syntax diagrams, and the book is an excellent value for the price.

Related blog articles that discuss the contents of this book:

  • Hash Joins – What is Wrong with this Statement?
  • ANSI Full Outer Join, Ready or Not?
  • Analytic Functions – What is Wrong with this Statement?
  • NO_QUERY_TRANSFORMATION Hint is Ignored… Well, Almost Ignored
  • Adding Comments to SQL Statements Improves Performance?

  • Actions


    %d bloggers like this: