Faulty Quotes 7 – Deadlock Kills Sessions?

4 01 2012

January 4, 2012

(Back to the Previous Post in the Series)

I thought that I would begin this blog article with a bit of a test case that demonstrates a very simple deadlock between two sessions.

In Session 1 (create the test table with a primary key and insert a single row):

CREATE TABLE T1 (
  C1 NUMBER PRIMARY KEY,
  C2 VARCHAR2(10));

INSERT INTO T1 VALUES (1,'1');

In Session 2 (insert a row that does not conflict with session 1 and then insert a row with the same primary key value as the uncommitted row in Session 1):

INSERT INTO T1 VALUES (2,'2');

INSERT INTO T1 VALUES (1,'3');

(Session 2 hangs while waiting for Session 1 to COMMIT or ROLLBACK)

In Session 1 (insert a row with the same primary key value as the uncommitted row in Session 2):

INSERT INTO T1 VALUES (2,'2');

(Session 1 hangs while waiting for Session 2 to COMMIT or ROLLBACK)

In Session 2 (an ORA-00060 is returned, deadlock between the two sessions ends):

INSERT INTO T1 VALUES (1,'3')
            *
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource

(Session 1 still hangs while waiting for Session 2 to COMMIT or ROLLBACK the first row that was inserted)

In Session 2 (confirming that the session was not killed/terminated and that the first row inserted still exists – indicates that the current TRANSACTION was not rolled back, just the statement that was involved in the deadlock):

SELECT
  *
FROM
  T1;

        C1 C2
---------- ----------
         2 2

While recently reading an OTN thread I was reminded of a bit of information that several sources simply miss-state about Oracle deadlocks.  The incorrect information is repeated, seemingly quite frequently, in various books and forum threads.  As luck would have it, several of the books that I have read about Oracle Database also repeat this incorrect information.  For instance, a quote from page 352 of the book “Expert Oracle Database 11g Administration” (a book that I reviewed and gave 4 out of 5 stars, maybe I need to re-evaluate that rating?):

“This is a catch-22 situation, because the stalemate can’t be broken by either session unilaterally. In such circumstances, Oracle steps in, terminates one of the sessions, and rolls back its statement. Oracle quickly recognizes that two sessions are deadlocked and terminates the session that issued the most recent request for a lock. This will release the object locks that the other session is waiting for.”

A second book, “Secrets of the Oracle Database” (another book that I reviewed and gave 4 out of 5 stars, actually it was probably worth about 4.4 stars), states the following on page 57:

“Hence, the ORACLE DBMS detects circular chains pertaining to interdependent locks, signals the error ‘ORA-00060: deadlock detected while waiting for resource’, and rolls back one of the sessions involved in the would-be deadlock.”

A third book, “Oracle SQL Recipes” (another book that I reviewed and gave 4 out of 5 stars), states the following on page 217:

“The immediate solution to the problem requires no user intervention: Oracle detects the deadlock and automatically rolls back one of the transactions, releasing one of the sets of row locks so that the other session’s DML will complete.”

A fourth book, “Oracle Database 11g Performance Tuning Recipes” (another book that I reviewed and gave 2 out of 5 stars), states on page 227:

“While the database can automatically resolve deadlocks between sessions (by killing one of the sessions holding a needed lock)…”

The Google book search found that the above four books were not alone in their statements of deadlocks that do not match the test case results at the start of this blog article.  One of the books found by Google is one that I have not read yet (and probably never will), “Oracle PL/SQL for Dummies”, which states the following on page 298:

“This is a deadlock. Oracle resolves it by raising a deadlock exception (ORA-00060: deadlock detected while waiting for resource) in user 1’s session. This terminates the procedure and allows user 2’s procedure to successfully complete.”

A second book found by the Google search is a bit old, “Oracle 9i DBA 101”, which on page 194 states:

“A deadlock occurs when two or more users wait for data that is locked by each other. Oracle will kill the first server process that detects the deadlock, so you should always look for and fix code that can cause deadlocks.”

There are certainly many other statements in other books regarding deadlocks, both correct and incorrect.  A quick search through the OTN forums found several threads in addition to the one at the start of this blog article that also drifted a bit from an accurate picture of what happens when a deadlock is broken in Oracle Database.

https://forums.oracle.com/forums/thread.jspa?threadID=2196282

“That is the deadlock. Oracle will then immediately choose arbitrarlly a victim session (542 or 548) and kill its process letting the remaining session continuing its work.”

https://forums.oracle.com/forums/thread.jspa?threadID=2152646

“There should be a simple wait, not a deadlock which would killed one of the session.”

https://forums.oracle.com/forums/thread.jspa?threadID=2261367

“If two sessions are actually deadlocked, Oracle should always kill one of the blocked sessions, raising an ORA-00060 error. It sounds like you’re saying this is not the expected behavior.”

https://forums.oracle.com/forums/thread.jspa?threadID=2227420

“The two session are inserting records in the same table with a unique index on it. With the same database version on another server and the same treatment, the deadlocks are correctly handled (one session is killed with ora-00060) and the process can continue.”

https://forums.oracle.com/forums/thread.jspa?threadID=1072665

“In oracle, when session1 and session2 make a deadlock, the oracle will kill one session , but I want know how to let oracle kill session1 and session2 both, is there way to do that?”

https://forums.oracle.com/forums/thread.jspa?threadID=2169563

“NO. Oracle already broke the deadlock by terminating one of these sessions. By the time trace file was closed, the deadlock no longer existed.”

https://forums.oracle.com/forums/thread.jspa?threadID=953308

“Oracle itself resolve the deadlock by killing the session.”

While reading the book “Oracle Core Essential Internals for DBAs and Developers” I almost flagged a sentence in a discussion of deadlocks as potentially containing an error.  That statement is found on page 82 near the end of the third to the last paragraph on the page.  After reading that paragraph about 10 times I decided that the statement is probably specific to TM locks, and probably is correct (the thought of TX locks maintained at the individual block level that contains the table rows kept running through my head the first nine times I read the paragraph).  This section of the book is interesting, as it not only avoids the mistakes mentioned in the above books, but it also highlights an inaccurate statement or two found in the Oracle Database documentation.





True or False – Optimizer Mode and Tuning Goals

13 05 2010

 May 13, 2010

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

Continuing the true or false blog article series with more questions that seem to have conflicting answers.  Today’s quiz is on the topic of the optimizer mode and tuning goals, inspired by a recent thread that I found in a discussion forum.  A Google search found a couple of related articles.

When answering the questions, answer True or False, followed by a justification for the answer that you provided.

The Articles:

 The Quiz:

1.  As of Oracle 10g R1, there are three methods for joining tables or other row sources: star transformation join, nested loops join, and hash join.

2.  SQL execution speed will be fastest when the first join between two row sources produces the smallest possible result set from the specified row sources.

3.  Oracle Database 10g’s SQL Profiles and SQL Access Advisor help to change execution plans for queries without adding hints.

4. Oracle Database 10g offers three optimizer modes, controlled by the OPTIMIZER_MODE initialization parameter: RULE, COST-BASED, and CHOOSE.

5.  An effective approach for query tuning involves enabling the RULE based optimizer either through a RULE hint, or by changing the session-level optimizer goal with the following SQL statement:

    ALTER SESSION SET OPTIMIZER GOAL = RULE;

6.  The default optimizer mode in Oracle 10g R1 is CHOOSE, which means that the RULE based optimizer will be used if none of the objects referenced in the SQL statement have statistics.

7.  A SELECT statement requiring 10 seconds and performing 10,000 DB_BLOCK_GETS is automatically less efficient that than a transformed version of the SELECT statement that performs 2,000 DB_BLOCK_GETS.

8.  The FIRST_ROWS optimizer mode is appropriate for OLTP type systems because it allows the small resultsets that are exepected by the end users to be returned as quickly as possible.

9.  The RULE based optimizer does not support any features introduced in Oracle since the mid 1990s.

10. Specifying a value for the OPTIMIZER_INDEX_CACHING parameter is recommended, because the value is critical for costing nested loops joins.  Its value should be set to indicate the percentage of an index that is likely to remain in Oracle’s buffer cache.

11. When the PARALLEL_AUTOMATIC_TUNING parameter is set to ON, the cost-based optimizer will artifically increase the calculated cost of index access paths, which increases the frequency of full table scans that may be executed in parallel.

12. With the RULE based optimizer, the driving table is the last table listed in the FROM clause of the SQL statement.  When the Cost-Based optimizer is used, the driving table is the first table listed in the FROM clause of the SQL statement.

13. The Cost-Based optimizer often performs unnecessary full table scans when three or more tables are listed in the FROM clause of a SQL statement.

14. Setting the OPTIMIZER_MODE parameter at the session level to FIRST_ROWS_10 instructs the query optimizer that the SQL statement which will be submitted next will return 10 rows.

I fear that this blog series might be falsely inflating the Google ranking of the Oracle documentation library due to the number of times the documentation library has been referenced – sorry Google users.  🙂





True or False – Hash Joins

12 05 2010

May 12, 2010

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

Continuing the true or false blog article series with more questions that seem to have conflicting answers.  Today’s quiz is on the topic of the hash joins, inspired by a recent thread that I found in a discussion forum.

When answering the questions, answer True or False, followed by a justification for the answer that you provided.

The Articles:

The Quiz:

1.  A hash join is ideal when joining a small table to a large table, and in such cases a hash join is typically faster than a nested loops join or a sort-merge join.

2.  When presented with very large tables in a production database, the Oracle query optimizer will attempt to avoid using hash joins unless the PGA_AGGREGATE_TARGET or HASH_AREA_SIZE (if PGA_AGGREGATE_TARGET is not used) is increased to a large value.

3.  When a PGA_AGGREGATE_TARGET is in use, one session cannot use all of the memory specified by PGA_AGGREGATE_TARGET nor all of the available memory in the server, whichever is the lower value.

4.  A hash join may use only 200MB of memory, or 5% of the PGA_AGGREGATE_TARGET, whichever is the lower value.

5.  The query optimizer will not select to use a hash join unless the hash join is able to complete in memory, as controlled by the PGA_AGGREGATE_TARGET and/or HASH_AREA_SIZE parameter values.

6.  A hash join is only faster than a nested loops join when the memory available for the hash join is at least 1.6 times the sum of bytes in the rows of the table to be hashed.

7.  A hash join’s performance is affected when the hash join overflows the memory established by the HASH_AREA_SIZE parameter, and the hash join spills to the TEMP tablespace.

8.  The degree of parallelism for a table (or individual SQL statement) must be adjusted when a full table scan is used to retrieve the rows during a hash join.  Hash joins typically require the parallel degree to be set to a value greater than 1 in order to improve performance over other join methods.

9.  The _PGA_MAX_SIZE parameter should be adjusted to improve performance when joining large tables with a hash join.

10. The Oracle query optimizer will commonly ignore USE_HASH hints.

11. The OPTIMIZER_INDEX_COST_ADJ parameter and OPTIMIZER_MAX_PERMUTATIONS parameters control the optimizer’s decision to use hash joins.

12. When the PGA_AGGREGATE_TARGET parameter is not used, the HASH_AREA_SIZE parameter should be set to 50% of the square root of the bytes in the smaller of the two row sources being joined, with a minimum value of 1MB.

13. The row source listed below the words HASH JOIN in an execution plan is the source for the hash table – the optimizer typically tries to select the smaller estimated row source of the two row sources as the source for the hash table. The second row source listed below the words HASH JOIN in an execution plan is fully scanned, probing the generated hash table in search of a match.

When attempting to answer the true or false questions, think about under what conditions the answer to the question will be true and under what conditions the answer to the question must be false.





True or False – NVL

11 05 2010

May 11, 2010

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

Continuing the true or false blog article series with more questions that seem to have conflicting answers, if the answers to those questions depend on the source of the information found through Internet searches.  Today’s quiz is on the topic of the NVL function, inspired by a recent thread that I found in a discussion forum.

When answering the questions, answer True or False, followed by a justification for the answer that you provided.

The Articles:

The Quiz:

1.  The most common use of the NVL function is to prevent division by zero errors in mathematical equations.

2.  The SQL statement:

SELECT NVL(CURRENT_STATUS, "Not disclosed") FROM T1;

   replaces NULL values found in the CURRENT_STATUS column of table T1 with the phrase: Not disclosed

3.  The NVL function replaces a NULL column value with any other value.

4.  The NVL function is the most efficient method for converting NULL values to a non-NULL value.

To help you with the questions, here is a test table:

CREATE TABLE T1(
  C1 NUMBER,
  C2 NUMBER,
  C3 NUMBER,
  C4 DATE,
  C5 DATE,
  C6 DATE,
  C7 VARCHAR2(20),
  C8 VARCHAR2(20),
  C9 VARCHAR2(20),
  PRIMARY KEY(C1));

INSERT /*+ APPEND */ INTO
  T1
SELECT
  ROWNUM,
  DECODE(MOD(ROWNUM,5),0,NULL,ROWNUM),
  DECODE(MOD(ROWNUM,5),0,NULL,ROWNUM),
  TRUNC(SYSDATE)+TRUNC(ROWNUM/100),
  DECODE(MOD(ROWNUM,5),0,NULL,TRUNC(SYSDATE)+TRUNC(ROWNUM/100)),
  DECODE(MOD(ROWNUM,5),0,NULL,TRUNC(SYSDATE)+TRUNC(ROWNUM/100)),
  RPAD(CHR(MOD(ROWNUM-1,26)+65)||
    CHR(MOD(ROWNUM,26)+65)||
    CHR(MOD(ROWNUM+1,26)+65)||
    CHR(MOD(ROWNUM+2,26)+65),20,'A'),
  DECODE(MOD(ROWNUM,5),0,NULL,
   RPAD(CHR(MOD(ROWNUM-1,26)+65)||
    CHR(MOD(ROWNUM,26)+65)||
    CHR(MOD(ROWNUM+1,26)+65)||
    CHR(MOD(ROWNUM+2,26)+65),20,'A')),
  DECODE(MOD(ROWNUM,5),0,NULL,
   RPAD(CHR(MOD(ROWNUM-1,26)+65)||
    CHR(MOD(ROWNUM,26)+65)||
    CHR(MOD(ROWNUM+1,26)+65)||
    CHR(MOD(ROWNUM+2,26)+65),20,'A'))
FROM
  DUAL
CONNECT BY
  LEVEL<=1000000;

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE)

The above creates a table with 1,000,000 rows, where one of every 5 rows contains NULL values in columns C2, C3, C5, C6, C8, and C9.

For some of the quiz questions, the following script might be helpful:

SET ARRAYSIZE 1000
SET AUTOTRACE TRACEONLY STATISTICS
SET TIMING ON

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'NVL_TEST';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';

SELECT /*+ GATHER_PLAN_STATISTICS */
  COALESCE(C3,C2,C1) NUM,
  COALESCE(C6,C5,C4) DAT,
  COALESCE(C9,C8,C7) VCAR
FROM
  T1;

SELECT /*+ GATHER_PLAN_STATISTICS */
  NVL(C3,NVL(C2,C1)) NUM,
  NVL(C6,NVL(C5,C4)) DAT,
  NVL(C9,NVL(C8,C7)) VCAR
FROM
  T1;

SELECT /*+ GATHER_PLAN_STATISTICS */
  NVL(C3,C1) NUM,
  NVL(C6,C4) DAT,
  NVL(C9,C7) VCAR
FROM
  T1;

SELECT /*+ GATHER_PLAN_STATISTICS */
  DECODE(C3,NULL,DECODE(C2,NULL,C1,C2),C3) NUM,
  DECODE(C6,NULL,DECODE(C5,NULL,C4,C5),C6) DAT,
  DECODE(C9,NULL,DECODE(C8,NULL,C7,C8),C9) VCAR
FROM
  T1;

SELECT /*+ GATHER_PLAN_STATISTICS */
  NVL(C3,NVL(C2,C4)) COL1,
  NVL(C6,NVL(C5,C7)) COL2
FROM
  T1;

SELECT
  SYSDATE
FROM
  DUAL;

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';

It is probably best to execute the above script several times to limit the effects of physical block reads.  You may see different performance results whether or not the 10046 trace is enabled, whether or not physical block reads are performed on every execution, and whether or not the test is executed directly on the database server.  The GATHER_PLAN_STATISTICS hint is used to permit DBMS_XPLAN.DISPLAY_CURSOR to retrieve the execution plan with the ALLSTATS LAST format parameter.





True or False – Undo

9 05 2010

May 9, 2010

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

Continuing the true or false blog article series with more questions that seem to have conflicting answers, if the answers to those questions depend on the source of the information found through Internet searches.  Today’s quiz is on the topic of Oracle Database’s undo functionality, and in some cases you may need to consult several of the suggested articles in order to correctly answer each of the questions.

When answering the questions, answer True or False, followed by a justification for the answer that you provided.  Yes, there are a lot of questions, but part of the entertainment in these types of exercises is reading an article and identifying what is logically right or wrong.  Of course, as with the previous quizzes, any answers that attempt to utilize logical fallacies will be scored as incorrect.

The Articles:

The Quiz:

1.  Rollback segment shrinks cause the next session to use the rollback segment to wait while the rollback segment is reduced in size to its optimal size.

2.  An OPTIMAL size for rollback segments should never be set because when the rollback segments are correctly sized, they should never increase in size.

3.  The ideal rollback segment size is partially determined by the values of the DB_WRITER_PROCESSES, LOG_BUFFER, AND LOG_CHECKPOINT_INTERVAL initialization parameters, as well as the COMMIT frequency.

4.  Large, long running batch jobs should be manually set to use a single, large rollback segment to reduce the risk of ORA-01555 (snapshot too old) errors.

5.  Adding a large undo segment when manual undo management is in use will effectively reduce the chances of sessions receiving ORA-01555 (snapshot too old) errors.

6.  When automatic undo management is used, ORA-01555 (snapshot too old) errors are triggered by Oracle Database creating many small undo segments in an undo tablespace that is too small.

7.  In a database with a single session connected, it is not possible to trigger an ORA-01555 (snapshot too old) error when the session is not modifying the tables that are being queried.

8.  When manual undo management is in use, only UPDATE transactions are able to automatically cause an undo segment to grow; SELECT statements do not cause the undo segments to grow, thus increasing the risk of ORA-01555 (snapshot too old) errors when the undo segments are small.

9.  When manual undo management is in use, it is a common practice for DBAs to assign a transaction to a specific rollback segment using the SET TRANSACTION command, but this is unnecessary with automatic undo management.

10. The automatically created undo segments in the SYSTEM tablespace will only be used when modifying objects residing in the SYSTEM tablespace.

 11. The UNDO_TABLESPACE parameter must be set to an undo tablespace’s name when automatic undo management is used.

12. The V$WAITSTAT view includes a block class that indicates the number of waits on the system undo header blocks.

13. The SESSIONS parameter should be set to a value no larger than the expected maximum number of user sessions since the SESSIONS parameter affects the number of undo segments created when automatic undo management is used.

14. When automatic undo management is used, as the number of sessions connected to the database instance increases, the instance dynamically adds additional (offline) undo segments in response to the sessions merely connecting.  The TRANSACTIONS_PER_ROLLBACK_SEGMENT parameter determines the number of additional sessions that must be connected before an additional undo segment is added.

15. SELECT type SQL statements do not generate redo nor undo.

16. In most databases, DML or DDL is performed in only 10% of all transactions in the database.

17. After a session changes data and issues a COMMIT, all sessions with access permissions to view the data are able to immediately see those changes.

18. Read consistency in Oracle means that data included in the query results will always be as of the moment the query was submitted.

19. Undo segments may be used to support flashback type operations.

20. The redo logs contain undo records, which allows Oracle Database to optimize the process of recovering from a shutdown abort, and also means that a future release of Oracle Database will no longer need dedicated undo tablespaces.

Note that a new blog article category has been added to quickly locate the quiz articles.  When attempting to answer the true or false questions, think about under what conditions the answer to the question will be true and under what conditions the answer to the question must be false.





True or False – Data Dump Import and Indexes

6 05 2010

May 6, 2010

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

While reading one of the OTN threads that asked whether Data Pump import will automatically rebuild indexes, I notice that someone posted an interesting response that appeared to be quoted from another source.  I searched the Internet and found the text that was copied into the OTN post, directly below a heading that read “Maximizing Oracle Import (impdp) Performance” – it was then that I decided it was time for another true or false quiz.

The reading material:

  1. http://www.dba-oracle.com/oracle_tips_load_speed.htm
  2. http://download.oracle.com/docs/cd/E14072_01/server.112/e10701/dp_import.htm
  3. http://download.oracle.com/docs/cd/E14072_01/server.112/e10701/dp_export.htm

Read through the above material and attempt to answer the following questions – as in the past, state why the answer is true or false.  Since the first article was last revised in March 2006, I probably should have referenced the documentation for Oracle Database 10g R2, but the Oracle Database 11g R2 documentation should be sufficient.

1.  The options used when data is exported with Data Pump Export (expdp) have no control over how the data may be imported with Data Pump Import (impdp).

2.  Data Pump Export supports enabling and disabling direct path for exports.

3.  Set the ANALYZE parameter of impdp to N to prevent statistics from being gathered on imported tables and indexes.

4.  Using solid state disks (SSD) will speed up import rates when the import job is otherwise fully tuned.

5.  Setting the COMMIT parameter of impdp to N suppresses the committing of the imported data until the load process ends.

6.  Setting the INDEXES parameter of impdp to N prevents indexes from being created until the import process completes.

7.  Setting the INDEXES parameter of impdp to N eliminates the index maintenance overhead when data is imported into a pre-existing table having pre-existing indexes.

8.  When the creation of indexes is postponed through the use of the INDEXES parameter of impdp, the resulting text file should be edited to set the parallel degree of each index to one less than the value of the CPU_COUNT initialization parameter’s value.

9.  Using the EXCLUDE=STATISTICS parameter value of impdp prevents the automatic of gathering of object statistics by impdp.  Setting this parameter value will greatly improve impdp performance.

10. When the _disable_logging initialization parameter is set to TRUE, either a SHUTDOWN ABORT or instance crash will disable/corrupt the database.  This corruption may be prevented by creating a backup of the database before modifying the value of that hidden initialization parameter.

Edit – May 6, 2010: The original title of this blog article was intended to be “True or False – Data Pump Import and Indexes” – a bit of unintentional humor was caused by the typo in the article title.  There are several definitions of the word “Dump“, some of which are more fitting than others.

SchedulerAutomation.zip





True or False – Direct Path Reads

21 04 2010

 April 21, 2010

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

Time for another true or false pop quiz.  Based on the number of views the previous true or false quizzes have experienced, this is a popular series.  However, the questions might be a little too hard to answer correctly – just what is correct is not always easy to determine?  Today’s quiz is on the topic of direct path reads, sparked by one of the test cases that we set up for the Expert Oracle Practices book.

Articles:

  1. http://www.dba-oracle.com/t_direct_path_read_waits.htm (dated June 5, 2009)
  2. http://oracledoug.com/serendipity/index.php?/archives/774-Direct-Path-Reads.html (dated May 22, 2006)
  3. http://oracledoug.com/serendipity/index.php?/archives/1321-11g-and-direct-path-reads.html (dated September 24, 2007)
  4. http://dioncho.wordpress.com/2009/07/21/disabling-direct-path-read-for-the-serial-full-table-scan-11g/ (dated July 21, 2009)
  5. http://antognini.ch/2009/07/impact-of-direct-reads-on-delayed-block-cleanouts/ (dated July 23, 2009)
  6. http://books.google.com/books?id=14OmJzfCfXMC&pg=PA121#v=onepage&q&f=false (dated 2004)

Answering the true or false questions probably requires examining the publication date of the article, if a specific Oracle release is not indicated in the article.  I have had a little bit of difficulty finding the exact (or even approximate) release dates of Oracle’s Database products in the past, so I performed a couple of searches and posted the results below.  The release dates listed below are from the following articles:
http://www.oracle.com/support/library/brochure/lifetime-support-technology.pdf
Metalink Doc ID 742060.1
http://www.orafaq.com/wiki/Oracle_8
http://www.orafaq.com/wiki/Oracle_8i

8         June 1997
8.1.5     February 1999
8.1.7     September 2000
9.0.1     June 2001
9.2.0.1   July 2002
10.1.0.1  January 2004
10.1.0.5  January 2006
10.2.0.1  July 2005
10.2.0.4  February 2008
11.1.0.6  August 2007
11.1.0.7  September 2008
11.2.0.1  September 2009

When answering the quiz, please provide justification for your answers – answer the question “why is the question true or false”.

The quiz:

1.  Direct path read waits will only appear when parallel query is used, for example, a full table scan executed in parallel. 

2.  Parallel query is enabled at the session and instance level by setting a table’s parallel degree to 32.

3.  Setting a parallel degree for a table or index forces the cost-based optimizer to calculate the cost of a full scan of the table (or index) with a lower value than the cost of the corresponding index’s range scan access path.

4.  Physical block reads performed during parallel query operations are always brought directly into the program global area (PGA), completely avoiding the buffer cache. *

5.  The hidden parameter _SERIAL_DIRECT_READ should be set to FALSE to prevent problems caused by excessive direct path reads.

6.  The hidden parameter _SMALL_TABLE_THRESHOLD affects Oracle’s decision to perform direct path reads.





True or False – Buffer Busy Waits

19 04 2010

April 19, 2010

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

An earlier blog article provided a couple of links that I found to be helpful when encountering buffer busy waits.  There are certainly other helpful web pages within reach of the nearest search engine.  I thought for today’s True or False quiz I would reference some of the pages found by the Google search engine.

Please read the following articles, and try to answer the questions that follow with either a True or False answer, followed by justification for your answer:

  1. http://dba-oracle.com/art_builder_bbw.htm (updated February 2008)
  2. rafioracledba.blogspot.com/2010/03/buffer-busy-waits-and-remedies.html
  3. yanggq.itpub.net/post/1594/111558
  4. remote-dba.net/oracle_10g_tuning/t_buffer_busy_waits.htm
  5. http://www.jlcomp.demon.co.uk/bbw.html
  6. http://books.google.com/books?id=bxHDtttb0ZAC&pg=PA284
  7. http://books.google.com/books?id=tdRes4IdLiIC&pg=PA1176
  8. http://books.google.com/books?id=TmPoYfpeJAUC&pg=PA434
  9. http://books.google.com/books?id=b3DIkYO2gBQC&pg=PA539

The quiz:

1.  Referencing the Top 5 Timed Events at the top of article #1, a Statspack report that indicates 2,154 waits for the CPU is a possible sign of a problem with buffer busy waits.

2. Referencing the Top 5 Timed Events at the top of article #1, a Statspack report that indicates 2,598 waits on the ‘db file sequential read’ wait event and 25,519 waits on the ‘db file scattered read’ wait event provides significant evidence of buffer busy waits when 7,146 seconds and 3,246 seconds, respectively, were reported for those wait events.

3.  I/O bound Oracle database instances frequently experience buffer busy waits.

4.  ‘db file sequential read’ waits and ‘db file scattered read’ waits appearing in the top 5 list in a Statspack report indicate that the database instance is I/O bound.

5.  Eliminating buffer busy waits in the database instance will also reduce the I/O load in the instance.

6.  To reduce the frequency of buffer busy waits, missing indexes should be created, the freelists for the affected tables and indexes should be increased, and then those objects should be moved to an ASSM tablespace.

7.  Referencing article #1, assume that you query V$SYSTEM_EVENT and find that there were a total of 636,528 buffer busy waits.  The reported number of waits on this event indicate that there were 636,528 distinct waits for a block, and this number of buffer busy waits is an indication of a severe problem.

8.  As of February 2008, a block that is in the process of being read into the buffer cache will trigger a buffer busy wait in another session if that other session requires access to that same block.

9.  As of February 2008, the P3 column of V$SESSION_WAIT indicates the reason code for a buffer busy wait, with a value of 0 indicating that a block is in the process of being read into the buffer cache.

10. Freelist groups should be used to reduce segment header contention.

Tip: Think carefully about the questions – some of the questions might not have obvious answers.





True or False – Autotrace

8 04 2010

April 8, 2010

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

I recently encountered a discussion thread (dbaforums.org/oracle/index.php?showtopic=19435) that asked about how to determine whether or not a SQL statement executed in a program uses indexes or full table scans – the database in question is an unspecified release version of Oracle 10g.  One of the responders in that thread pointed to an undated article (praetoriate.com/teas_prae_util11.htm) about the AUTOTRACE functionality of SQL*Plus, while other responders suggested tracing the program’s execution and then using TKPROF.

Please read the article, keeping in mind that the question concerns some release version of Oracle 10g, and see if you are able to answer the following true or false questions.  State why you believe that the question is true, or why you believe that the question is false.  Any answers that attempt to utilize logical fallacies will be scored as incorrect.

1. SQL*Plus’ AUTOTRACE feature performs statement tracing.

2. Using SQL*Plus’ AUTOTRACE feature requires a PLAN_TABLE in the schema of the user using AUTOTRACE.

3. AUTOTRACE retrieves the actual execution plan, along with the associated statistics for the execution.

4. When a large number of rows will be returned by SQL*Plus, the AUTOTRACE TRACEONLY feature should be used.

5. For SQL performance issues, AUTOTRACE is the first tool of choice for investigating the SQL performance issues.

6. Performance issues that are present when SQL is executed in an application will also be present when the SQL statement is executed in SQL*Plus with AUTOTRACE enabled.

——————————————————————————-
——————————————————————————-

May 25, 2010

The following screen capture is completely unrelated to this blog article, but is relevant to the comments about the blog redesign.  This is how the redesigned blog appears on a 16:9 netbook, with 1024 horizontal pixels of resolution:

I had to scroll the window slightly to the right to see the entire content portion of the blog, while the “floating” navigation section is hidden from view.  Internet Explorer 8.0 offers a zoom feature near the bottom right of the IE window – that feature (or a similar one in other browsers) might be a solution if your monitor does not offer 1024 pixels of horizontal resolution.

————————

Original Layout (4:3 monitor):
 
Modified Layout (4:3 monitor):





True or False – Wait Events

31 03 2010

March 31, 2010

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

A continuation of the series of true or false quizzes for the week – maybe this one is too easy.  Today’s quiz is on the topic of wait events.   Assuming that the questions apply to a release of Oracle that has been available for up to five years, state why you believe that the answer to the question is true, or why you believe that the answer to the question is false.  Any answers that attempt to utilize logical fallacies will be scored as incorrect.

1. Assume that a database instance is being prepared for 1,500 to 2,000 sessions.  Given an unlimited budget, it is possible to configure the database instance to completely eliminate wait events.

2. Consider two identical Oracle databases with a single session connected to each database instance.  Each of the two sessions submits exactly the same SQL statement, with the first session experiencing no wait events while executing the query, and the second session experiencing nearly continuous I/O wait events.  The lack of wait events implies that the query execution for the first session is optimal, and should be the performance target for the second session.

3. Idle wait events are insignificant in performance tuning exercises, and should be ignored.

4. For every one minute of elapsed time, each CPU in the server is capable of accumulating 60 seconds of CPU wait time.

Did I mention that I dislike true/false type questions?  But then these are not simple true/false questions.





True or False – Improving Performance of SQL Statements

30 03 2010

March 30, 2010

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

A continuation of the concept of yesterday’s true or false quiz – this one is a little easier, with new reading material.  Today’s quiz is on the topic of improving performance of SQL statements.  The reading material:
Article 1
Article 2
Article 3

Article 4
Article 5
Article 6
Article 7
Article 8 (actually a book)
Article 9 (actually a book)
Article 10 (actually a book)
Article 11

Please read the above articles and see if you are able to answer the following true or false questions, assuming that the questions apply to a release of Oracle that has been available for up to five years.  State why you believe that the answer to the question is true, or why you believe that the answer to the question is false.  Any answers that attempt to utilize logical fallacies will be scored as incorrect.  (Most of these questions can be answered with the help of the first couple of links.)

1. Queries containing subqueries should be rewritten as a logical step in improving query performance.

2. Complex queries should be decomposed into multiple queries using global temp tables and/or WITH clauses as a step toward improving query performance.

3. Significant performance improvements, possibly 20 fold, are possible by replacing some or all of a SQL statement with PL/SQL.  If true, provide an example showing a significant performance improvement with such a replacement.  If not true, provide an example that shows that performance did not improve significantly.

4. There are cases were performance improvements are possible by modifying a SQL statement containing a three table equijoin, which accesses primary and foreign keys columns of the tables, into a SQL statement which accesses a single table with two PL/SQL functions (each containing a SELECT) in column positions.





True or False – Oracle Sorting

29 03 2010

March 29, 2010 (Updated April 6, 2010 with a test table)

(Forward to the Next Post in the Series)

I recently encountered a discussion thread (dbaforums.org/oracle/index.php?s=eacd9ff86b358b4a14ecd3fd7653a9fd&showtopic=19407) that pointed to a news article about the internals of Oracle sorting.  The news article (dba-oracle.com/t_oracle_sorting.htm) has a date of October 15, 2007, so it is probably reasonable to assume that the article describes the behavior of Oracle Database 10g R2, and possibly Oracle Database 11g R1.

Please read the news article and see if you are able to answer the following true or false questions.  State why you believe that the question is true, or why you believe that the question is false.  Any answers that attempt to utilize logical fallacies will be scored as incorrect.

1. Sequencing of database output first started in the 1960s.

2. One of the very important components of Oracle tuning is Oracle sorting, yet that process is often overlooked.

3. SSD is a synonym for a super-fast RAM disk.

4. Oracle Database always performs an automatic sorting operation when a GROUP BY clause is used in a SQL statement, when an index is created, and when an ORDER BY clause is used in a SQL statement.

5. The cheapest method is always used by Oracle Database when ordering a resultset.

6. A hinted execution plan involving a single table, with a /*+ index */ hint, will always retrieve the rows in the sorted order of the index.

7. If a SQL statement requires a single sort operation that completes in memory, that SQL statement will not use any space in the TEMP tablespace when the rows are retrieved – with the assumption that a hash join did not spill to disk.

8. The CPU_COST parameter causes Oracle Database to favor the pre-sorted ordering of an index over a discrete sorting operation.

9. The value of the SORT_AREA_SIZE parameter or the PGA_AGGREGATE_TARGET parameter if used, influences Oracle Database’s decision to prefer the pre-sorted ordering of an index over a discrete sorting operation.

10. The clustering factor of an index influences Oracle Database’s decision to prefer the pre-sorted ordering of an index over a discrete sorting operation.

11. The default database block size in use by the database influences Oracle Database’s decision to prefer the pre-sorted ordering of an index over a discrete sorting operation.

12. A sort operation will only spill to disk when RAM is exhausted.

13. “At the time a session is established with Oracle, a private sort area is allocated in memory for use by the session for sorting, based on the value of the sort_area_size initialization parameter.”  Supporting evidence:
http://books.google.com/books?id=gsFC1D1LmvQC&pg=PA306&lpg=PA306#v=onepage&q=&f=false
http://www.articles.freemegazone.com/oracle-sorting.php
oracle-training.cc/oracle_tips_sort_operations.htm

14. For sort intensive tasks it is not possible to adjust the amount of memory allocated to those tasks by adjusting the SORT_AREA_SIZE parameter at the session level.

15. The entire database can be slowed down due to a disk sort in the TEMP tablespace because sorts to disk are I/O intensive.

16. A good general rule is that the SORT_AREA_SIZE parameter should be adjusted to eliminate sorts to disk caused by GROUP BY operations.

17. Buffer pool blocks are allocated to hold or manage the blocks that are in the TEMP tablespace.

18. An optimal workarea execution, completed entirely in memory, is always preferred over a one-pass or multi-pass workarea execution.

19. Free buffer waits can be caused by excessive sorts to disk, which cause data blocks needed by other sessions to be paged out of the buffer.

20. One percent is an acceptable ratio of disk sorts to the total number of sorts.

21. When the PGA_AGGREGATE_TARGET parameter is specified, the total work area size cannot exceed 200MB.

22. No task may use more than 10MB for sorting.

23. A DBA should modify two hidden (underscore) parameters to permit up to 50MB of memory to be used for an in-memory sort operation for a SQL statement.

Have you ever read an article in an attempt to find the answer to a very specific question, only to find that by the time the end of the article is reached, you now have a whole new set of questions?  Try to answer the above questions using something beyond true or false – tell me why in detail it is true or why it is false.  Are there any other questions that could be asked about the article?

——-

Test table for question #10, added April 6, 2010:

CREATE TABLE T1 (
  C1 NUMBER,
  C2 NUMBER,
  C3 VARCHAR2(100),
  PRIMARY KEY (C1));

INSERT INTO T1
SELECT
  ROWNUM,
  DECODE(MOD(ROWNUM,100),0,NULL,ROWNUM),
  RPAD('A',100,'A')
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

COMMIT;

ALTER TABLE T1 MODIFY (C2 NOT NULL);

CREATE INDEX IND_T1_C2 ON T1(C2);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE)




Faulty Quotes 6 – CPU Utilization

5 02 2010

February 5, 2010

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

The ideal level of CPU utilization in a server is an interesting topic.  Google (and other search engines) find a number of different web pages that advocate that 100% CPU utilization is ideal, CPU utilization at 95% is likely catastrophic, significant queuing for CPU time begins when the CPUs are 75% to 80% busy, as well as a number of other interesting nuggets of information.  It is important to keep in mind that at any one instant, a CPU (or core or CPU instruction thread) is either 100% busy or 0% busy – at any one instant a CPU cannot be 75% busy.  The 75% or 95% utilization figures found on various web sites, in books, and in presentations are actually an average utilization between two points in time – whether those two points in time represent 0.000001 seconds, 24 hours, or somewhere in between could be very important when trying to determine if there is an excessive CPU utilization issue that causes service level agreement problem (or “slowness” problems reported by end-users).

Assume that in a one minute time period, the CPU utilization in a server is 75% – is that suitable, or is that undesirable, or not enough information is available to make an educated guess?  Good !?  Maybe good?  Bad?  Keep in mind that the CPU utilization is an average utilization between a starting time point and an ending time point – much like with a Statspack/AWR report, if you look at too large of a time period, significant problems may be masked (hidden from view) when the statistics from the time intervals containing problems are averaged over a long time period.  The 75% CPU utilization could indicate that for every three of four points in time the CPU had work that needed to be performed.  The 75% CPU utilization might also indicate that there was intense competition for the CPU time by many tasks for the first 45 seconds, followed by a complete absence of the need for CPU time in the last 15 seconds of the one minute time period.  For the many tasks competing for CPU time in the first 45 seconds, what might normally complete in one second might have actually required close to 45 seconds due to the operating system attempting to allocate portions of the server’s CPU time to each of the tasks that needed to use the CPU.  The tasks queue up while waiting for their turn for processing, in what is known as the CPU run queue.  As more processes enter the run queue, it takes longer for each process to perform each unit of their normal processing.  This is where the topic of queuing theory becomes very important.  Two very helpful books that discuss queuing theory as it applies to Oracle Database functionality are “Optimizing Oracle Performance” (by Cary Millsap with Jeff Holt) and “Forecasting Oracle Performance” (by Craig Shallahamer).  (Note: This example used one minute as the time interval for measuring CPU utilization in order to rationalize the competition for CPU resources into terms that are easily understood – assuming that a given 3GHz processor is only able to perform one operation at a time, that processor is capable of performing 3,000,000,000 operations per second – 180,000,000,000 operations in that one minute.)

There are a couple of different formulas used in queuing theory, including the Erlang C function, Little’s Law, and Kendall’s Notation.  I will not go into significant detail here on the different queuing theory models, but I will provide a simple example.  Assume that you enter a grocery store that has 10 checkout lanes (think of this like 10 CPUs in a database server).  When it is time to pay for the items in your cart, a person working for the store directs you into one of the 10 checkout lanes.  If anyone else is directed into the same checkout lane as you, you will need to alternate with that person at the checkout counter every 10 seconds – when your 10 second time period is up, you will need to load up everything placed on the conveyor belt and allow the other person to unload their items on the belt to use the checkout lane for 10 seconds  (this loading and unloading of items could be time consuming).  If anyone else is directed into your checkout lane, that person will also be able to use the checkout counter for 10 second intervals.  In short order, what would have required 5 minutes to complete is now requiring 30 minutes.  If the line grows too long in one checkout lane, there might be a chance to jump into a different checkout lane used by fewer people, possibly once a minute (some Linux operating systems will potentially move a process from one CPU to a less busy CPU every 200ms).  Jumping into a different checkout lane not only allows you to check out faster, but also allows the people who remain in the original line to check out faster.  The above is a very rough outline of queuing theory.  If the customer expects to check out in no more than 10 minutes, how many lanes are necessary, given that the customers arrive at a random rate, and we must meet the target 99% of the time.

CPU queuing is not a linear problem – 100% CPU utilization is not twice as bad as 50% CPU utilization, it is much worse than that.  Some of the articles below explain this concept very well – a Google search found a couple of interesting articles/presentations that computer science professors assembled for various classes – you might find it interesting to read some of those documents that are found in the .edu domain (it appears that none of those links made it into this blog article).  Some operating systems use a single run queue (for instance, Windows, and Linux prior to the 2.6 kernel release), with the end result of effectively evenly distributing the CPU load between CPUs, causing the processes to constantly jump from one CPU to another (this likely reduces the effectiveness of the CPU caches – pulling everything off the conveyor belt in the analogy).  Other operating systems have a separate run queue for each CPU, which keeps the process running on the same CPU.  Quick quiz: If our 10 CPU server in this example has a run queue of 10 – does that mean that one process is in each of the 10 CPU run queues, or is it possible that all 10 processes will be in just one of the 10 run queues, or possibly something in between those two extremes?  Are all three scenarios equally good or equally bad?

Keep in mind that while sessions are in “wait events” that does not mean that the sessions are not consuming server CPU time.  A session in an Oracle wait event might motivate a significant amount of system (kernel) mode CPU time on behalf of the session.  Sending/receiving data through the network, disk accesses, inspection of the current date/time, and even reading eight bytes (a 64 bit word) from memory motivates the use of the server’s CPU time.  CPU saturation may lead to latch contention (note that latch contention may also lead to CPU saturation due to sessions spinning while attempting to acquire a latch), long-duration log file waits (log file sync, log file parallel write), cluster-related waits, increased duration of single-block and multiblock reads, and significant increases in server response time.

So, with the above in mind, just what did my Google search find?  In the following quotes, I have attempted to quote the bare minimum of each article so that the quote is not taken too far out of context (I am attempting to avoid changing the meaning of what is being quoted).

Oracle Performance Tuning 101” (Copyright 2001, directly from the book) by Gaja Vaidyanatha states:

“One of the classic myths about CPU utilization is that a system with 0 percent idle is categorized as a system undergoing CPU bottlenecks… It is perfectly okay to have a system with 0 percent idle, so long as the average runnable queue for the CPU is less than (2 x number of CPUs).”

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:6108562636780

“Are you at 100% utilization?  If not, you haven’t accomplished your job yet.  You cannot put CPU in the bank and save it for later.  So, if you are running with idle cycles you should be looking for more ways to use it.”

dba-oracle.com/t_high_cpu.htm

“Remember, all virtual memory servers are designed to drive CPU to 100%, and 100% CPU utilization is optimal, that’s how the server SMP architecture is designed.   You only have CPU enqueues when there are more tasks waiting for CPU, than you have CPU’s (your cpu_count)… Remember, it is not a cause for concern when the user + system CPU values approach 100 percent. This just means that the CPUs are working to their full potential.”

dba-oracle.com/t_tuning_cpu_usage_vmstat.htm

“Remember, it is not a cause for concern when the user + system CPU values approach 100 percent. This just means that the CPUs are working to their full potential. The only metric that identifies a CPU bottleneck is when the run queue (r value) exceeds the number of CPUs on the server.”

dba-oracle.com/t_monitor_cpu_unix.htm

“Within UNIX, the OS is geared to drive CPU consumption to 100%, so the best way to monitor CPU usage is by tracking the ‘r’ column in vmstat”

dba-oracle.com/t_detecting_windows_cpu_processor_bottlenecks.htm

“100% utilization DOES NOT always indicate any bottleneck. It just means that the CPU is busy!  You ONLY have a CPU bottleneck when the runqueue exceeds cpu_count.”

fast-track.cc/op_unix_79_identifying_high_cpu.htm

“Please note that it is not uncommon to see the CPU approach 100 percent even when the server is not overwhelmed with work. This is because the UNIX internal dispatchers will always attempt to keep the CPUs as busy as possible. This maximizes task throughput, but it can be misleading for a neophyte.”

dbaforums.org/oracle/index.php?showtopic=5552

“It’s normal for virtual memory systems to drive the CPU to 100%.

What you need to look for are CPU runqueues, not 100% values”

dbaforums.org/oracle/index.php?showtopic=9986

“No problem! Processors are designed to drive themselves up to 100%.

You are only CPU-bound when the runqueue exceeds the number of processors”

http://forums.oracle.com/forums/thread.jspa?messageID=2518290

“100% utilization is the optimal state. If you want to look for CPU bottlenecks, use vmstat and check the “r” (runqueue) column…  It’s not a claim, it’s a fact, according to the folks who built their servers!  The vendors who build the servers say that 100% CPU utilization is optimal, and they wrote both the OS and the underlying hardware… Every 1st year DBA noob panics at some point when they go into top and see that the CPU is at pegged at 100%.”

http://forums.oracle.com/forums/message.jspa?messageID=2501989

“All SMP architectures are designed to throttle-up the CPU quickly, and a 100% utilization DOES NOT mean an overload. It’s straight from Algorithms 101…  Just to make sure that you are not operating under “assumptions” here, I’m talking about server-side CPU consumption, on an SMP server running lots of concurrent tasks. The references to 100% CPU are as they display in standard OS monitors like lparstat, watch, sar and vmstat.  Also, don’t assume that all OS tasks have the same dispatching priority. In a server-side 100% CPU situation, some tasks may have enqueues, while other do not. That’s what ‘nice’ is for.”

http://books.google.com/books?id=cHHMDgKDXtIC&pg=PA112

“Remember, it is not a cause for concern when the user + system CPU values approach 100 percent.  This just means that the CPUs are working to their full potential. The only metric that identifies a CPU bottleneck is when the run queue (r value) exceeds the number of CPUs on the sever.”

——————–

Before deciding that a 100% CPU utilization is not only normal, but something we should all try to achieve, visit the following links and spend a little time reading the text near the quoted section of the document.

——————–

“Optimizing Oracle Performance” page 264, by Cary Millsap:

“On batch-only application systems, CPU utilization of less than 100% is bad if there is work waiting in the job queue. The goal of a batch-only system user is maximized throughput. If there is work waiting, then every second of CPU capacity left idle is a second of CPU capacity gone that can never be reclaimed. But be careful: pegging CPU utilization at 100% over long periods often causes OS scheduler thrashing, which can reduce throughput. On interactive-only systems, CPU utilization that stays to the right of the knee over long periods is bad. The goal of an interactive-only system user is minimized response time. When CPU utilization exceeds the knee in the response time curve, response time fluctuations become unbearable.”

“Forecasting Oracle Performance” page 71 by Craig Shallahamer:

 “With the CPU subsystem shown in Figure 3-7, queuing does not set in (that is response time does not significantly change) until utilization is around 80% (150% workload increase). The CPU queue time is virtually zero and then skyrockets because there are 32 CPUs. If the system had fewer CPUs, the slope, while still steep, would have been more gradual.”

“Forecasting Oracle Performance” page 195 by Craig Shallahamer:

“The high-risk solution would need to contain at least 22 CPUs. Because the reference ratios came from a 20 CPU machine, scalability is not significant. However, recommending a solution at 75% utilization is significant and probably reckless. At 75% utilization, the arrival rate is already well into the elbow of the curve. It would be extremely rare to recommend a solution at 75% utilization.”

http://forums.oracle.com/forums/thread.jspa?messageID=2518290

“First: check the following simple example of how wrong you can be in saying {‘using’ all of your CPU is a good thing} especially in a multi-user, shared memory environment such as an active Oracle instance. You see, although ‘using’ all of your CPU may be desirable if you don’t waste any of it, in a multi-user system you can waste a lot of CPU very easily – even when nobody goes off the run queue.”

http://forums.oracle.com/forums/message.jspa?messageID=2501989

“But it’s not ‘normal’ to drive CPUs to 100%. Except for extremely exotic circumstances (and that excludes database processing) it means you’ve overloading the system and wasting resources…  Consider the simple case of 8 queries running on 8 CPUs. They will be competing for the same cache buffers chains latches – which means that seven processes could be spinning on the same latch while the eighth is holding it. None of the processes ever need wait, but most of them could be wasting CPU most of the time.”

http://www.dell.com/downloads/global/solutions/public/White_Papers/hied_blackboard_whitepaper.pdf Page 19

“One of the sizing concepts that is independent of the server model is resource utilization. It is never a good idea to attempt to achieve 100% resource utilization. In the Blackboard benchmark tests, the optimum Application Server CPU Utilization was 75% to 90%. In general, clients should size all Application Servers to achieve no more than 75% CPU utilization. For database servers, the optimum CPU utilization is 80% in non-RAC mode. In RAC mode, clients should consider CPU utilization rates around 65% at peak usage periods to allow reserve capacity in case of cluster node failover.”

http://www-03.ibm.com/support/techdocs/atsmastr.nsf/5cb5ed706d254a8186256c71006d2e0a/546c74feec117c118625718400173a3e/$FILE/RDB-DesignAndTuning.doc

“The CPU utilization goal should be about 70 to 80% of the total CPU time. Lower utilization means that the CPU can cope better with peak workloads.  Workloads between 85% to 90% result in queuing delays for CPU resources, which affect response times. CPU utilization above 90% usually results in unacceptable response times.  While running batch jobs, backups, or loading large amounts of data, the CPU may be driven to high percentages, such as to 80 to 100%, to maximize throughput.”

11g R2 Performance Tuning Guide :

“Workload is an important factor when evaluating your system’s level of CPU utilization. During peak workload hours, 90% CPU utilization with 10% idle and waiting time can be acceptable. Even 30% utilization at a time of low workload can be understandable. However, if your system shows high utilization at normal workload, then there is no room for a peak workload.”

Oracle9i Application Server Oracle HTTP Server powered by Apache Performance Guide

“In addition to the minimum installation recommendations, your hardware resources need to be adequate for the requirements of your specific applications. To avoid hardware-related performance bottlenecks, each hardware component should operate at no more than 80% of capacity.”

Relational Database Design and Performance Tuning for DB2 Database Servers

Page 26: “When CPU utilization rises above 80%, the system overhead increases significantly to handle other tasks. The lifespan of each child process is longer and, as a result, the memory usage supporting those active concurrent processes increases significantly. At stable load, 10% login, and CPU utilization below 80%, the memory usage formula is as follows…”

Page 27: “When system load generates a high CPU utilization (>90%) some of the constituent processes do not have enough CPU resource to complete within a certain time and remain ‘active’.”

Oracle Database High Availability Best Practices 10g Release 2

“If you are experiencing high load (excessive CPU utilization of over 90%, paging and swapping), then you need to tune the system before proceeding with Data Guard. Use the V$OSSTAT or V$SYSMETRIC_HISTORY view to monitor system usage statistics from the operating system.”

“Optimizing Oracle Performance” page 317

“Oracle’s log file sync wait event is one of the first events to show increased latencies due to the time a process spends waiting in a CPU run queue.”

Metalink Doc ID 148176.1 “Diagnosing hardware configuration induced performance problems”

“In general your utilization on anything should never be over 75-80%…”

http://www.tomfarwellconsulting.com/Queuing%20Presentation.pdf

“Linear thinking is a common human process.  This notion implies that if an input increases by 20 percent the output of the system changes by 20 percent.

Computer response does not follow a linear curve. It is entirely possible that a change in computer input by 20 percent results in a change in output of hundreds of percent.”

http://www.mcguireconsulting.com/newsl_queuing.html

“This utilization-based multiplier increases exponentially and does so rapidly after the 50% utilization point, as shown in the graph below. The translation is: if a resource’s utilization is much beyond 50%, there is a higher probability that congestion will occur. Keep in mind that at 100% utilization, the delay goes to infinity, which is the direction of these curves.”

http://www.db2-dba.net/articles/Article-Usage%20Factor.html

“Request Service time =  ( Ideal Request Service time x  Usage factor ) / (1 – Usage factor )    where   0>=  Usage factor >= 1
The  Request Service time is proportional to U/(1-U).

As you can see from the simple plot above as U reaches 0.95  you are fast approaching the meltdown point.
As U gets closer to .95  the Service time of the system reacts violently and starts approaching infinity. 
The ‘system’  might be your  CPU , DISK, Network, employee or your motor car. 

It is just a bad idea to push the average resource utilization factor beyond 0.9, and the peak resource utilization factor beyond 0.95.”

http://databaseperformance.blogspot.com/2009/01/queuing-theory-resource-utilisation.html

“So an Oracle database server does conform to the general model in Queuing Theory of having lots of separate clients (the shadow servers) making requests on the resources in the system. And as a result, it does conform to the golden rule of high resource utilisation equals queues of pending requests.

As a result, 100% CPU utilization is very bad, and is symptomatic of very large queues of waiting processes. Queuing Theory also shows that above 50% utilization of a resource, there is always a request in the queue more often than not…  A general rule of thumb is to get worried at 80% utilization, as the number of concurrent requests will average something around four, and rises exponentially above this.”

http://kevinclosson.wordpress.com/2007/07/21/manly-men-only-use-solid-state-disk-for-redo-logging-lgwr-io-is-simple-but-not-lgwr-processing/

“Once LGWR loses his CPU it may be quite some time until he gets it back. For instance, if LGWR is preempted in the middle of trying to perform a redo buffer flush, there may be several time slices of execution for other processes before LGWR gets back on CPU…” Fix the CPU problem, and the other significant waits may decrease.

Newsflash – 100% CPU is Worse than You Think!

“Amazing. During my entire discussion of CPU load and process priorities I completely ignored the fact that I’m using 2 dual core cpus on that system, and that all Oracle processes use shared memory, which means shared resource, which means locks, which means resource wasting by waiting for locks. And this complicated the discussion, because 6 processes on 8 CPUs will also waste time waiting for locks. You don’t need 100% CPU to suffer from this.”

Opinions are bound to change over time.  The first two quotes are from OakTable Network members, and those quotes were originally written eight or nine years ago.  If you were to ask those two people today (or even shortly after the release of the “Optimizing Oracle Performance” book in 2003), they might state something a bit different about driving and then holding CPUs at 100% utilization.  Interestingly, holding a CPU at 100% utilization will cause its core temperature to gradually increase.  Logic in some of the CPUs will sense this increase in temperature and actually throttle back the speed of the CPU until the core temperature drops to acceptable levels.  Of course, when the CPU speed is throttled back, that potentially causes the CPU run queue to increase in length because the amount of work required for by each process has not decreased, while the number of CPU cycles per second available for performing work has decreased.  

More could be written on this subject, but I will leave it at this point for now (for instance, I could have mentioned process/thread priority gradually decreasing on some operating systems for those processes consuming a lot of CPU time).  Opinions, other than this article being too short (or too long)?  As you can probably tell from the quotes, CPU utilization issues are not just a special situation on a single operating system, or a special situation with a certain program (Oracle Database).





Faulty Quotes 5 – Block Sizes

31 01 2010

January 31, 2010 (Updated Feb 1, 2010)

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

The topic of deviating from the default 8KB block size in Oracle Database, or using multiple block sizes in a single database seems to surface every couple of months in the OTN forums, Oracle-L, comp.databases.oracle.server Usenet group, and similar discussion forums.  I think that I understand why.  A lot of information has been written that advocates using multiple block sizes in a single Oracle database, or using the largest possible block size to improve “full scan” or “range scan” performance.  Such information is found in blogs, news articles, discussion forums, expert sites, books, and even Oracle’s download.oracle.com website.  So, why do people ask questions about using larger than default block sizes or multiple block sizes in discussion forums if there are so many sources of information that say “just do it”.  Well, chances are that the Google (or other search engine) search that found all of the sources recommending the use of non-standard settings also found several pages where people basically stated “stop, think, understand before making any changes.”  See the Faulty Quotes 3 blog article.

So, you might be curious what my Google search found.  Is it a best practice to implement multiple block sizes in a single database, and is it a best practice to move all of your indexes to a tablespace using the largest supported block size?  (See chapter 1 of Expert Oracle Practices for a discussion on the topic of implementing “best practices”.)  In the following quotes, I have attempted to quote the bare minimum of each article so that the quote is not taken too far out of context (I am attempting to avoid changing the meaning of what is being quoted).

http://download.oracle.com/docs/cd/E13214_01/wli/docs102/dbtuning/dbtuning.html

“Oracle9i introduced a new feature that allowed a single instance of the database to have data structures with multiple block sizes. This feature is useful for databases that need the flexibility of using a small block size for transaction processing applications (OLTP); and a larger block size to support batch processing applications, decision support systems (DSS), or data warehousing. It can also be used to support more efficient access to larger data types like LOBs.”

http://www.virtual-dba.com/pdfs/Xtivia_WP_Oracle_Best_Practices_2008.pdf  (page 14)

“In Oracle databases 9i, 10g, and 11g, it is a best practice to use multiple block sizes; this allows you to tailor the block size to a specific type of access. Place tables and indexes in tablespaces sized (block size) according to access…”

http://www.oracle.com/technology/products/database/clustering/pdf/bp_rac_dw.pdf (page 19)

“Larger oracle block sizes typically give fewer index levels and hence improved index access times to data. A single I/O will fetch many related rows and subsequent requests for the next rows will already be in the data buffer. This is one of the major benefits of a larger block size. Another benefit is that it will decrease the number of splits.”

dba-oracle.com/art_so_blocksize.htm

“Because the blocksize affects the number of keys within each index block, it follows that the blocksize will have an effect on the structure of the index tree. All else being equal, large 32k blocksizes will have more keys per block, resulting in a flatter index than the same index created in a 2k tablespace.”
“As you can see, the amount of logical reads has been reduced in half simply by using the new 16K tablespace and accompanying 16K data cache. Clearly, the benefits of properly using the new data caches and multi-block tablespace feature of Oracle9i and above are worth your investigation and trials in your own database.”

rampant-books.com/t_oracle_blocksize_disk_i_o.htm

“B-tree indexes with frequent index range scans perform best in the largest supported block size.  This facilitates retrieval of as many index nodes as possible with a single I/O, especially for SQL during index range scans.  Some indexes do not perform range scans, so the DBA should make sure to identify the right indexes”

praetoriate.com/t_oracle_tuning_data_buffer_pools.htm

“This is an important concept for Oracle indexes because indexes perform better when stored in large block size tablespaces.  The indexes perform better because the b-trees may have a lower height and mode entries per index node, resulting in less overall disk overhead with sequential index node access.”

remote-dba.cc/s56.htm

“Indexes want large block sizes – B-tree indexes perform best in the largest supported block size and some experts recommend that all indexes should reside in 32K block size tablespaces. This facilitates retrieval of as many index nodes as possible with a single I/O, especially for SQL performing index range scans.”
“Many DBAs make their default db_block_size 32k and place indexes, the TEMP tablespace and tables with large-table full-table scans in it, using other block sizes for objects that require a smaller fetch size.”

remote-dba.net/unix_linux/multiple_block_sizes.htm

” Large blocks – Indexes, row-ordered tables, single-table clusters, and table with frequent full-table scans should reside in tablespaces with large block sizes.”

oracle-training.cc/s54.htm

“Larger block sizes are suitable for indexes, row-ordered tables, single-table clusters, and tables with frequent full-table scans. In this way, a single I/O will retrieve many related rows, and future requests for related rows will already be available in the data buffer.”

oracle-training.cc/oracle_tips_block_sizes.htm

“Indexes want large block sizes – Indexes will always favor the largest supported blocksize. You want to be able to retrieve as many index nodes as possible in a single I/O, especially for SQL that performs index range scans.  Hence, all indexes should reside in tablespaces with a 32k block size.”

oracle-training.cc/t_oracle_multiple_buffers.htm

“One of the first things the Oracle9i DBA should do is to migrate all of their Oracle indexes into a large blocksize tablespace. Indexes will always favor the largest supported blocksize.”

http://forums.oracle.com/forums/thread.jspa?messageID=2445936

“It’s pretty well established that RAC performs less pinging with 2k blocksizes”
“Large blocks gives more data transfer per I/O call.”
“Indexes like big blocks because index height can be lower and more space exists within the index branch nodes.”

dbapool.com/articles/040902.html

“Index Branches: Larger oracle block sizes typically give fewer index levels and hence improved index access times to data .This is one of the major benefits of a larger block size.”

toadworld.com/LinkClick.aspx?fileticket=fqDqiUsip1Y=&tabid=234  (page 8 )

“In Oracle9i and Oracle10g it is a good practice to use multiple block sizes, this allows you to tailor the block size to a specific type of access. Place tables and indexes in tablespaces according to access. For single block read type OLTP access, use 8k block sizes. For full table scan access such as with data warehouses use 16-32K block sizes. For index lookups use 8-16K block sizes. For indexes that are scanned or bitmap indexes, use 16-32K block sizes.”

dbaforums.org/oracle/index.php?s=87341768e1865563322676a1bd504db6&showtopic=83&pid=133&mode=threaded&start=#entry133

“Multiuple blocksizes are GREAT, but ONLY if your database is I/O-bound… Finally, research has proved that Oracle indexes build cleaner in large blocksizes.”

searchoracle.techtarget.com/tip/Oracle-tuning-Blocksize-and-index-tree-structures

“Because the blocksize affects the number of keys within each index block, it follows that the blocksize will have an effect on the structure of the index tree. All else being equal, large 32k blocksizes will have more keys per block, resulting in a flatter index than the same index created in a 2k tablespace… You can use the large (16-32K) blocksize data caches to contain data from indexes or tables that are the object of repeated large scans.”

dbazine.com/oracle/or-articles/burleson2

“Hence, one of the first things the Oracle9i database administrator will do is to create a 32K tablespace, a corresponding 32K data buffer, and then migrate all of the indexes in their system from their existing blocks into the 32K tablespace… Indexes will always favor the largest supported blocksize.”

statspackanalyzer.com/sample.asp

“You are not using large blocksizes for your index tablespaces. Oracle research proves that indexes will build flatter tree structures in larger blocksizes.”

http://books.google.com/books?id=xxx0KAwY_ZMC&pg=PT133#v=onepage&q=&f=false

“If you have large indexes in your database, you will need a large block size for their tablespaces.”
“Oracle provides separate pools for the various block sizes, and this leads to better use of Oracle memory.”

noriegaaoracleexpert.blogspot.com/2007/08/advances-in-multiple-block-size-caches.html

“… and using multiple block caches act as an intelligent cache differentiator that automatically leverage cache performance optimization. I have successfully tested, like many other DBAs and developers, that beyond any possible SGA tuning that using multiple-block-size database can certainly improve performance through this performance approach.”

http://books.google.com/books?id=Wx6OmllCfIkC&pg=PA164#v=onepage&q=&f=false

“Simply by using the new 16K tablespace and accompanying 16K data cache, the amount of logical reads has been reduced by half.  Most assuredly, the benefits of properly using the new data caches and multi-block tablespace feature of Oracle9i and later, are worth examination and trials in the DBA’s own database.”

http://books.google.com/books?id=bxHDtttb0ZAC&pg=PA406#v=onepage&q=&f=false

“Objects that experience full scans and indexes with frequent range scans might benefit from being placed in a larger block size, with db_file_multiblock_read_count set to the block size for that tablespace.”

http://books.google.com/books?id=Uf2pb1c1H2AC&pg=RA1-PA317#v=onepage&q=&f=false

“Indexes want large block sizes: Indexes will always favor the largest supported block size… Hence, all indexes should reside in tablespaces with a 32K block size.”

dba-oracle.com/oracle_tips_multiple_blocksizes.htm (Added Feb 1, 2010):

“At first, beginners denounced multiple block sizes because they were invented to support transportable tablespaces.  Fortunately, Oracle has codified the benefits of multiple blocksizes, and the Oracle 11g Performance Tuning Guide notes that multiple blocksizes are indeed beneficial in large databases to eliminate superfluous I/O and isolate critical objects into a separate data buffer cache:

‘With segments that have atypical access patterns, store blocks from those segments in two different buffer pools: the KEEP pool and the RECYCLE pool…'”

————————-

Before deciding whether or not to implement a large block size (or a very small block size), or add either a larger or smaller than default block size tablespace, I suggest reviewing the following:

http://download.oracle.com/docs/cd/B28359_01/server.111/b28274/iodesign.htm#i20394  (directly relates to Faulty Quotes 3)

“The use of multiple block sizes in a single database instance is not encouraged because of manageability issues.”

“Expert Oracle Database Architecture”
http://books.google.com/books?id=TmPoYfpeJAUC&pg=PA147#v=onepage&q=&f=false

“These multiple blocksizes were not intended as a performance or Tuning feature, but rather came about in support of transportable tablespaces…”

http://www.freelists.org/post/oracle-l/32K-block-size-tablespace-for-indexes,4

“But in most cases the administration overhead is much bigger than the performance benefit. You can easily end up with over- or undersized db_XXk_cache_size and the database can’t do anything about it. Then the performance will be better in some parts of the day and worse later on.”

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1468781700346675276

“I would not recommend going into a system planning on using multiple blocksizes – they were invented for one thing, to transport data from a transactional system to a warehouse (where you might be going from 8k OLTP to 16/32k warehouse) and to be used only to extract/transform/load the OLTP data.”

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1468781700346675276

“My block size is 4096 and my db_32k_cache_size=67108864
I want to create a tablespace with 32K and rebuild all indexes into this tablespace. These are
frequently used indexes. Do you think is there any benefit for using 32K block size in this scenerio”

“before you do something, you should have an identified goal in mind
so, tell us all – WHY would you do this? Don’t say “cause I read on some website it makes things super fast” (it doesn’t), tell us WHY you think YOU would derive benefit from this?
I do not think there is in general benefits to be gained from using multiple block size tablespaces – short of TRANSPORTING data from one block size to another for an ‘extract transform and load’ process.”

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:729373100346730466

“BUT – do not use multiple block sizes for anything other than transporting data from database A to database B where the block size in A is different from B. No silver bullets with this ‘trick’, nothing you want to do in real life. The cases whereby multiple blocksizes are useful are typically limited to benchmarks, old wives tales, and very exceptionally rare conditions.”

http://jonathanlewis.wordpress.com/2009/03/22/block-size-again/
“ORA-01555: snapshot too old” caused by large block size

“Oracle9i Performance Tuning Tips & Techniques”
http://books.google.com/books?id=59ks3deVd0UC&pg=PA9#v=onepage&q=&f=false

“Warning: Oracle development does not support the notion of using multiple block sizes for performance tuning. The nonstandard block caches are not optimized.”

http://forums.oracle.com/forums/thread.jspa?messageID=2445936

“How can I determine which block size is correct for my database.”

“Use 8k. This is right in the middle, and won’t put you in an edge condition. Call it the Goldilocks block, not to small, not to big, just right.
For both OLTP and DSS, 8k is an optimal size. I use 8k, always.
There is minimal gains to be had in messing with block sizes. Having good db design and good execution plans is a better place to worry about performance.”

Series of related articles (there are at least 5 related articles in this series where the author directly addresses many of the claimed benefits of fiddling with block sizes):
http://richardfoote.wordpress.com/category/index-block-size/

Summary of an OTN forums thread – what was likely the longest thread ever on the topic of block sizes (and very likely multiple block sizes in the same database) from June 2008.  The message thread was too large to be supported on the new OTN software due to performance reasons.  Fortunately, Jonathan Lewis obtained a copy of the thread content in a PDF file:
http://jonathanlewis.wordpress.com/2008/07/19/block-sizes/

Related to the above mentioned OTN thread:
http://structureddata.org/2008/08/14/automatic-db_file_multiblock_read_count/
http://structureddata.org/2008/09/08/understanding-performance/

I posted a number of test cases in the above mentioned OTN thread where I simulated some of the activity in a data warehouse, and activity in an OLTP type database.  To a large extent, the performance was very close to being identical in the databases with the default 8KB and 16KB tablespaces, with just a few exceptions.  As I recall, the 16KB database encountered performance problems when a column with a NULL value was updated, and when a rollback was performed.

Below you will find the scripts to reproduce my test cases that appeared in the above mentioned OTN thread, and the performance results that I obtained.  The OLTP test required roughly 10-12 hours to complete:
Block Size Comparison (save with a .XLS extension and open with Microsoft Excel).

I guess the message is that you should verify that the swimming pool contains water before diving in head first.





Proving that 1=2, is Oracle Wrong to Short-Circuit an Execution Plan?

25 12 2009

December 25, 2009

Earlier this month I wrote a blog article that showed how Oracle behaves when the WHERE clause includes the predicate 1=2.  But is the shortcut implemented by Oracle wrong?  Will 1=2 never happen?

I attended a college course that covered just mathematical proofs (the name of the class escapes me at the moment – Discrete Mathematics?).  The mathematics professor for that course proved that 1=2.  That proof follows (I attempted to align the equal sign to make the proof easier to read):

Assumption: Let a = b
1.  a^2               = a^2
2.  a^2               = b^2
3.  a^2               = b * b
4.  a^2               = a * b
5.  a^2 - b^2         = a * b - b^2
6.  (a + b) * (a - b) = b * (a - b)
7.  a + b             = b
8.  b + b             = b
9.  2b                = b
10. 2 = 1

 

Mathematical Explanation of the Above Steps:
1.  Truth Statement
2.  Substitution
3.  Factor
4.  Substitution
5.  Subtract b^2 from Both Sides of the Equation
6.  Factor
7.  Divide Both Sides of the Equation by (a - b)
8.  Substitution
9.  Simplification
10. Divide Both Sides of the Equation by b

I think that there is a lesson in the above that may be applied to the understanding of Oracle databases.  Let me ponder the lesson while you review the Faulty Quotes series of blog articles.

A couple references for mathematical proofs follow – how might the techniques of mathematical proofs be applied to understanding the logic built into Oracle Database?http://www.eiu.edu/~mathcs/mat2345/index/Webview/Slides/handout-Week02-2×3.pdf
http://www.math.harvard.edu/archive/23b_spring_05/proofs.pdf
http://en.wikibooks.org/wiki/Category:Mathematical_Proof
http://wapedia.mobi/en/Inductive_proof





Faulty Quotes 4 – Buffer Cache Hit Ratio (BCHR)

22 12 2009

December 22, 2009

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

Over the years a lot of conflicting advice for maintaining Oracle databases has surfaced in books, magazine articles, websites, Usenet (and other discussion forums), and presentations.  Even as the year 2009 draws to a close there are Oracle forum posts asking for help in improving the buffer cache hit ratio (BCHR). 

Here is an interesting history of recommendations for the buffer cache hit ratio (BCHR), as presented in user conferences for a particular ERP platform:

1999:

Find the statistics in V$SYSSTAT or REPORT.TXT

  Logical reads = db block gets + consistent gets
  Buffer cache hit ratio (%) = ((logical reads - physical reads)/logical reads)*100

Ratio should be above 90%, if not increase the shared_pool_size

——————–

2001:

HIT_RATIO=((DB BLOCK GETS + CONSISTENT GETS - PHYSICAL READS)/
           (DB BLOCK GETS + CONSISTENT GETS))*100

90 – 100 %  Few physical reads. Current size is optimal if not a bit high.  OK to remove some buffers if memory needed elsewhere.
70 – 89 % Buffer cache has low to moderate number of physical reads.  Consider resizing if there is a serious problem with memory on the Oracle database.
0 – 69 % Buffer cache is experiencing moderate to high number of physical reads.  Consider adding more buffers to the buffer cache.

——————–

2004:

Be aware that the Cache Hit Ratio isn’t necessarily the definitive answer; although it can be a good indication that something is up.

——————–

2006:

Bad performance indicators – Poor cache hit ratios

——————–

2008:

The buffer cache hit ratio was never mentioned.

——————–
If the above is any indication, recommending the use of the buffer cache hit ratio as a performance metric is on the decline.  This seems to be confirmed by the helpful replies that typically follow requests for improving the buffer cache hit ratio in various forum threads.  But, there is a twist.  Do we know what the buffer cache hit ratio is supposed to measure?  A search of the Internet, including Metalink, finds a number of formulas for calculating the buffer cache hit ratio.  Some of those forumulas follow:

(logical reads – physical reads)/logical reads

 ——

1 – (physical reads)/(consistent gets + db block gets)

——

(logical reads)/(logical reads + physical reads)

——

1 – (physical reads – physical reads direct – physical reads direct (lob))/(consistent gets + db block gets)

——

1 – (physical reads – (physical reads direct + physical reads direct (lob)))/(db block gets + consistent gets – (physical reads direct + physical reads direct (lob)))

So many formulas, and then we have this one from the 11g R2 Performance Tuning Guide:
http://download.oracle.com/docs/cd/E11882_01/server.112/e16638/memory.htm#g61128

1 - (('physical reads cache') / ('consistent gets from cache' + 'db block gets from cache')

Nice… and a bit confusing.  Pick a formula, any formula.  But, what is it measuring?

A search of the Internet finds a large number of articles discussing the benefits of using the buffer cache hit ratio as a performance metric.  For example:

www.cryer.co.uk/brian/oracle/tuning_bchr.htm

“If the cache-hit ratio goes below 90% then:
* For Oracle 8 and earlier: increase the initialisation parameter DB_BLOCK_BUFFERS.
* For Oracle 9 onwards: increate the initialisation parameter DB_CACHE_SIZE.”

——————–

publib.boulder.ibm.com/tividd/td/oracle2/GC32-0454-00/en_US/HTML/oracle20rgf2k81.htm

“In general, if the hit ratio is below 90%, and the dictionary cache has been tuned, increase the init.ora parameter DB_BLOCK_BUFFERS to increase the hit ratio.”

——————–

oracle-training.cc/teas_elite_troub_5.htm

“One must use delta statistics over time to come up with a meaningful value for the ratio, and high logical I/O values can definitely be a leading cause of bad execution times.  However, when properly computed, the buffer cache hit ratio  is an excellent indicator of how often the data requested by users is found in RAM instead of disk, a fact of no small importance.”

——————–

www.iselfschooling.com/mcpfcd1/iscdpfh05vrl.htm

“As a DBA, you are responsible for monitoring and calculating the Buffer Cache Hit Ratio in the SGA memory in case of performance problems…

— If the Buffer Cache Hit Ratio is more than 90% then there is no problem.
— If the Buffer Cache Hit Ratio is between 70% and 90% then there could be a problem.
— And if the Buffer Cache Hit Ratio is less than 70%, there is definitely a problem and the Buffer Cache size needs to be increased.”

——————–

praetoriate.com/t_%20tuning_data_buffer_hit_ratio.htm

“To summarize, our goal as the Oracle DBA is to allocate as much RAM as possible to the data buffers without causing the database server to page-in RAM. Whenever the hourly data buffer hit ratio falls below 90 percent, we should add buffers to the block buffers.”

——————–

dba-oracle.com/art_dbazine_bust_ora_myth_bust.htm

“New myth – Ratio-based Oracle tuning is meaningless…
However, most OLTP systems and systems in which the working set of frequently-referenced data are not cached will greatly benefit from ratio-based tuning. Oracle Corporation recognizes that monitoring SGA usage ratios and adjusting the size of the SGA regions can have a dramatic impact on system performance, and this is the foundation of Oracle10g Automatic Memory Management (AMM) in which Oracle calculates the point of diminishing marginal return from adding data buffers”

——————–

oracle-training.cc/t_viewing_sga_performance.htm

“The data buffer hit ratio can provide data similar to v$db_cache_advice , and most Oracle tuning professionals use both tools to monitor the effectiveness of data buffers and monitor how AMM adjusts the sizes of the buffer pools.”

——————–

oracle-training.cc/t_allocating_oracle_buffer_caches.htm

“The DBHR is a common metric used by Oracle tuning experts to measure the propensity of a row to be in the data buffer.  For example, a hit ratio of 95 percent means that 95 percent of row requests were already present in the data buffer, thereby avoiding an expensive disk I/O.  In general, as the size of the data buffers increases, the DBHR will also increase and approach 100 percent.”

——————–

oracle-training.cc/phys_55.htm

“In order for the DBA to determine how well the buffer pools are performing, it is necessary to measure the hit ratio at more frequent intervals.  Calculating the DBHR for Oracle8 and beyond is more complicated than earlier versions, but the results enable the administrator to achieve a higher level of tuning than was previously possible.”

——————–

dba-oracle.com/t_buffer_cache_hit_ratio_value.htm

“But the question remains about the value of this metric to the DBA.

Once I’ve tuned and stabilized my systems, I notice that the metrics create repeatable ‘signatures’, patterns of usage that form the baselines for the exception alerts.

First, we establish a ‘exception threshold’ for the BCHR, (e.g. +- 20%), and compare that deviation to the historical average, normalized by the historical average per hour and the day-of-the-week.”

——————–

http://forums.oracle.com/forums/thread.jspa?messageID=1701819

“Many folks misunderstand that bit about ‘setting your own BHR’, and falsely conclude that it’s a useless metric. It’s not useless.

Of course, doing lots pre-buffered I/O (consistent gets) will increase the BHR, that’s the whole point, right?

That does not mean that the BHR is useless, it just means that it’s not a panacea.

The BHR remains very useful for detecting ‘undersized’ data buffers, where the working-set is not cached and Oracle is forced to do extra physical reads. . . .

If the BHR was totally useless, why does Oracle continue to include it in OEM alert thresholds, and STATSPACK and AWR reports?

The BHR is just like any other Oracle metric, you must understand its limitations and proper usage. It’s just one of many tools…”

——————–

http://groups.google.com/group/comp.databases.oracle.server/browse_thread/thread/1a946dbe8dcfa71e  (Many, many pros and cons)

“BCHR can be manipulated. That is nothing new. All stats can be inflated in similar manners. But that doesn’t make them all meaningless. Given everything else being equal, high BCHR is always better than low BHCR…  BCHR alone is not meant to tell performance. If it does, we would not have to look at anything else…  BCHR alone does not tell you about overall performance. It simply tell you the disk I/O percentage. It is an indicator, a very meaningful one.”

A number of examples advocating the use of the buffer cache hit ratio also exist in Oracle books, many of which may be viewed, in part, through Google searches:

Creating a self-tuning Oracle database: automating Oracle9i Dynamic SGA
http://books.google.com/books?id=_rQ-WyEJTgkC&pg=PA51

“The data buffer hit ratio (DBHR) measures the propensity for a block to be cached in the buffer pool, and the goal of the DBA is to keep as many of the frequently used Oracle blocks in buffer memory as possible…”

——————–

Oracle9i High-Performance Tuning with STATSPACK
http://books.google.com/books?id=gsFC1D1LmvQC&pg=RA1-PA264

“From this chart we can clearly see that the DBHR dropped below the recommended value of 90 percent at 3:00 A.M., 4:00 A.M., and 10:00 A.M. each day…  The problem here is that the DBHR is dropping low at 10:00 A.M., a prime-time online period.”

——————–

Oracle Tuning: The Definitive Reference
 http://books.google.com/books?id=bxHDtttb0ZAC&pg=PA217

“A query like the one that follows can be used to see a metric’s behavior for the recent time period.  For example, the following query shows data buffer hit ratio history for the last hour.”

——————–

Expert Oracle Database 11g Administration” – Page 190

“this is why the buffer cache hit ratio, which measures the percentage of time users accessed the data they needed from the buffer cache (rather than requiring a disk read), is such an important indicator of performance of the Oracle instance.”

The author provides a link on page 1161 to an article authored by Cary Millsap which discusses why a higher buffer cache hit ratio may not be ideal. This is definitely a step in the right direction regarding the buffer cache hit ratio, but it might be better to simply ignore the statistic.

——————–

Oracle Database 10g Performance Tuning Tips & Techniques
http://books.google.com/books?id=omq9aRx8s0EC&pg=P750
http://books.google.com/books?id=omq9aRx8s0EC&pg=PT806

“Some DBAs (usually those trying to sell you a tuning product) minimize the importance of hit ratios (proactive tuning) and focus completely on waits (reactive tuning), since focusing on waits is a great way to quickly solve the current burning problems. By monitoring the Instance Efficiency section (and using all of STATSPACK and Enterprise Manager), the DBA will combine reactive and proactive tuning and will find some problems before the users scream or wait events hit the top 5 list. Hit ratios are one important piece of the puzzle (so are waits).”

“Hit ratios are a great barometer of the health of your system. A large increase or drop from day to day is an indicator of a major change that needs to be investigated.”

“Hit Ratio = (Logical Reads – Physical Reads) / Logical Reads” “The buffer hit ratio should be above 95 percent. If it is less than 95 percent, you should consider increasing the size of the data cache by increasing the DB_CACHE_SIZE initialization parameter (given that physical memory is available to do this).”

——————–

There are, of course, very strong counter-points to using the buffer cache hit ratio (BCHR) as a tuning metric.  The first two are very well written, detailed articles:

http://jonathanlewis.wordpress.com/2007/09/05/hit-ratios-2/  (a very detailed blog entry)

“Ratios are highly suspect for monitoring purposes. If you think a ratio is helpful, think carefully about whether you should be monitoring the two underlying values instead.

The buffer cache hit ratio (BCHR) is a particularly bad example of the genre as there are so many events that make the attempt to correlate BCHR and performance meaningless.”

——————–

http://richardfoote.wordpress.com/2007/12/16/buffer-cache-hit-ratios-useful-or-not/ (a very detailed blog entry)

“The biggest problem of all with the BCHR is that regardless of it’s values, or whether it goes up or down or remains unchanged, we need to perform precisely the same database checks regardless as it doesn’t tell us whether the ‘health’ of the database has improved, got worse or remains unchanged.”

——————–

http://www.orafaq.com/wiki/Improving_Buffer_Cache_Hit_Ratio

“Many DBAs do their best to get a 99% or better buffer cache hit ratio, but quickly discover that the performance of their database isn’t improving as the hit ratio gets better.”

——————–

http://books.google.com/books?id=dN805AoUyc8C&pg=PA365#v=onepage&q=&f=false

“The evidence that hit ratios are unreliable is overwhelming, and similar ratio fallacies occurring in other industries are well documented.”

——————–

Oracle Performance Troubleshooting: With Dictionary Internals SQL & Tuning
http://books.google.com/books?id=ELVz1O8Z0qoC&pg=PA38

“For example, there have been a number of benchmarks done to prove that a 99-100% buffer cache hit ratio does not mean that a database is running well.  A high cache hit ratio can be observed while the database is literally at a standstill.”

——————–

Cary Millsap’s “Why a 99%+ Database Buffer Cache Hit Ratio is Not Ok”
http://www.oradream.com/pdf/Why%20a%2099%20Cahe%20Hit%20Ratio%20is%20Not%20OK.pdf

“Many tuning professionals and textbook authors sell advice encouraging their customers to enjoy the performance virtues of Oracle database buffer cache hit ratios that approach 100%. However, database buffer cache hit ratio is not a reliable system performance metric. Buffer cache hit ratios above 99% usually indicate particularly serious SQL inefficiencies”

——————–

http://forums.oracle.com/forums/thread.jspa?messageID=3492715

“The data buiffer hit ratio has limited value”

——————–

http://books.google.com/books?id=OOJ5qMytsoYC&pg=PA561

“This ratio should not be used as an indicator of database performance health.  Much Oracle software documentation touts the database buffer cache hit ratio as being one of the most important tuning metrics.  In my opinion and that of many others this statement is often a complete and absolute falsity.”

——————–

http://www.jlcomp.demon.co.uk/Cache_advice.html

“So if someone tells you that the buffer cache hit ratio must be a good thing because Oracle has based their v$db_cache_advice technology on it, then they are displaying a lack of understanding about the deficiencies of the buffer cache hit ratio in particular, and how LRU (least recently used) caching mechanisms work in general.”

——————–

http://groups.google.com/group/comp.databases.oracle.server/browse_thread/thread/7ef93e3ef8963a29 (Howard J. Rogers)

“Who cares?

Are your users complaining about performance?

If not, then your hit ratio is just fine and dandy, whatever it happens to be.

The more general point here is that, *** and *** notwithstanding, the buffer cache hit ratio is an absolutely abysmal way of tuning anything. It can sometimes offer a useful corollary to other statistics; to allow you to distinguish between two otherwise equally plausible causes for, for example, free buffer waits. But as a performance tuning goal in its own right? Furgedaboudit.”

——————–

http://groups.google.com/group/comp.databases.oracle.server/browse_thread/thread/7508e23d122b27a2/ (Richard Foote)

“And it only take *one* piece of what I technically define as ‘Crap’ code to both inflate the BHR to incredibly high levels whilst at the same time killing or impacting *database* performance.

I’m probably more sympathetic to BHRs than many. However, it provides only one very small piece in the tuning puzzle, one that needs to be put into perspective. It can be used as an indicator of whether the buffer cache is set way to low/high and nothing more. And what it’s actual *value* is of little consequence, there is no such thing as an ideal value x.

Does a 99.9% BHR mean the database/buffer cache/sql is well tuned. Possibly.

Does a 99.9% BHR mean the database/buffer cache/sql is poorly tuned. Possibly.

So what does a 99.9% BHR actually mean and represent? Without being able to answer this question in it’s fullness, the figure is meaningless.

You get the point.”

——————–

Tuning Oracle Without Cache-Hit Ratios
http://www.quest.com/whitepapers/TuningOracleWithoutCacheHit_new.pdf

“From time immemorial, Oracle performance tuning has the infamous label of witchcraft, secretly practiced by an elite group of individuals, who allegedly use voodoo to cure the performance problems of an Oracle system. To compound this misperception there exists many thousands of pages of published material that propagate the idea of tuning Oracle with cache-hit ratios.”

Bottom line, Oracle tuning efforts need to be based on isolating and pinpointing bottlenecks (the disease) not cache-hit ratios (the symptoms).”

——————–

Of course, tools are available to help correct a low buffer cache hit ratio:

From the book “Optimizing Oracle Performance
http://books.google.com/books?id=dN805AoUyc8C&pg=PA368

Connor McDonald’s “Choose any hit ratio”
http://www.oracledba.co.uk/tips/choose.htm

Jonathan Lewis’ “A higher buffer hit ratio is a good thing. (17-Jan-2001)” script
http://www.jlcomp.demon.co.uk/myths.html





Faulty Quotes 3 – Contradictory Information

21 12 2009

December 21, 2009

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

Frustrated by some of the confusing and contradictory information I have encountered on the Internet over the years, I put together a “cheat sheet” to help identify useful information.  The “cheat sheet” lists several questions that one might consider when reviewing books and web articles prior to changing parameters (or implementing other changes) based on the information found in those sources:

  • Is a specific Oracle release mentioned in the book or article? What was true, or thought to be true, with release 8.0.5 might not be true or even a good idea with release 11.2.0.1.
  • Does the article have a publication date, and is there a revision history that identifies the date and what modifications were made to the article? Articles which change from one day to the next without knowing what changed, and why the article changed, are difficult to use as justification for changes to the initialization parameters.
  • Are there any articles by other authors on the Internet which agree with the author’s suggestions or sharply disagree with the author’s suggestions? If Oracle’s official documentation strongly disagrees with the contents of the article, which of the two sources are correct? Should the advice be deemed an over-generalization which worked as a fix for a one time problem that is now advertised as something all DBAs should do as a first step in performance tuning?
  • Is there reproducible evidence that supports the claims made? Or, is the majority of the justification similar to “I have seen it a 100 times” or “a DBA at a fortune 50 company said to do this” or “I have been doing this for 25 years, and you should too”?
  • Does the parameter actually control the behavior which it is purported to control, and are there any potential side effects from modifying the parameter?




Faulty Quotes 2 – Test Cases

6 12 2009

December 6, 2009 (Updated February 24, 2010)

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

Are test cases valuable or not?  These quotes were found through Internet searches:

http://forums.oracle.com/forums/thread.jspa?threadID=587394

“Oracle is NOT MATH, a single contrary test case does not invalidate any general principle of Oracle performance.”

http://forums.oracle.com/forums/thread.jspa?threadID=963129

“Just because someone show a case where indexes do not need to be rebuilt, that DOES NOT mean that positive cases don’t exist!

I can write a test case to ‘prove’ that virtually any statement about Oracle is un-true. It’s easy.

People who believe that a single negative test case proves something is wrong join the ranks of the ‘deniers’, folks who cite ‘proof’ that the moon landing never happened, and that 911 was a government conspiracy.”

http://forums.oracle.com/forums/thread.jspa?messageID=3195898&tstart=0 (thread was taken offline by OTN, but still shows in search results)

“100% true? You are joking, right? There is NOTHING that is 100% true about database tuning . . . . YOU CANNOT PROVE ANYTHING ABOUT ORACLE PERFORMANCE. EVER. NO EQUATIONS, NO PROOFS, NO WAY, NO HOW. . . . Only fools or charlatans will claim that something about Oracle performance has been ‘proven wrong’. . . .

http://forums.oracle.com/forums/thread.jspa?threadID=963129&start=0

“It’s ridiculous to reverse engineer Oracle with test cases, when we can ask the people who hold the source code.”

“A test case is not the same as software testing!

The problem is that a single-user ‘test case’ on a PC is not a valid test, by any measure. . .

It does not accurately reproduce real-world behavior, especially in performance tuning, where slowdowns are only seen under heavy loads.”

“It baffles me why any practicng DBA would want to write a test case, when they have a real-world test database, full of real data and waiting to be used . . .”

————————-

(Added February 24, 2010):

If you search this site for the phrase test case, I think that it will be clear that test cases, when properly constructed, are extremely helpful for determining how things work – and how things should not work.  Properly constructing a test case is critical to help eliminate false causation and false correlation.  Employing the scientific method (secondary reference) is important when building test cases to help control false positives and false negatives.  Keep in mind that if something is stated as an absolute (for example “the sun rises in the East” or “on Earth the sun always rises from the East“), it only requires a single negative test case to refute the absolute statement.





Faulty Quotes 1 – OPTIMIZER_INDEX_COST_ADJ

6 12 2009

December 5, 2009

(Forward to the Next Post in the Series)

There are several initialization parameters, some of which are hidden (all hidden parameters begin with an underscore character and should not be changed without authorization from Oracle support), which control memory utilization, execution plan costing, latch spin behavior, maximum I/O size for multiblock reads from disk, etc. It is easy to fall into a pattern of blindly changing the initialization parameter without identifying the actual source of the problem for which the initialization parameter change is expected to magically correct. The approach of blindly changing the initialization parameters without understanding the scope/purpose of the parameters nor the source of the problem is apparently encouraged by various Oracle books, official looking web pages, and web discussion threads.

One such parameter that is frequently abused is OPTIMIZER_INDEX_COST_ADJ.  This parameter specifies the percentage of the calculated index cost to retain.  A value of 1 for OPTIMIZER_INDEX_COST_ADJ is a bad idea, as it not only makes index access paths appear to be 1/100 times as expensive (1% of the original cost) which will drive the use of indexes, but also potentially causes the wrong index to be used if two or more indexes have the same (rounded) calculated cost.

Examples quotes recommending low values for this parameter, even in 10g R1 and above:

praetoriate.com/t_op_sql_index_behavior.htm and
http://books.google.com/books?id=4xEozkbvxboC&pg=RA1-PA359#v=onepage&q=&f=false

“The optimizer_index_cost_adj parameter is an initialization parameter that can be very useful for SQL tuning. It is a numeric parameter with values from zero to 1,000 and a default value of 1,000.”

praetoriate.com/t_op_sql_index_behavior.htm

“If you are having slow performance because the CBO first_rows mode is favoring too many full-table scans, you can reset the optimizer_index_cost_adj parameter to immediately tune all of the SQL in your database to favor index scans over full-table scans.”

http://books.google.com/books?id=TuzniwcsNtUC&pg=PT771#v=onepage&q=&f=false

“Unfortunately, the optimizer might do more full table scans as a result of setting DB_FILE_MULTIBLOCK_READ_COUNT (you don’t want this behavior), so you may also need to set the OPTIMIZER_INDEX_COST_ADJ to a number, such as 10, to eliminate this problem and drive the use of indexes.”

http://books.google.com/books?id=omq9aRx8s0EC&pg=PT171#v=onepage&q=&f=false

“If the use of DB_FILE_MULTIBLOCK_READ_COUNT starts to cause too many full table scans (since the optimizer now decides it can perform full table scans much faster and decides to do more of them) then set OPTIMIZER_INDEX_COST_ADJ between 1 and 10 (I usually use 10) to force index use more frequently.”

oracle-training.cc/oracle_tips_opq.htm

“The default value for optimizer_index_cost_adj is 1,000, and any value less than 1,000 makes the CBO view indexes less expensive. If you do not like the propensity of the CBO first_rows mode to favor full-table scans, you can lower the value of optimizer_index_cost_adj to 10, thereby telling the CBO to always favor index scans over full-table scans.”

http://books.google.com/books?id=bxHDtttb0ZAC&pg=PA566#v=onepage&q=&f=false

“The most important parameter is the optimizer_index_cost_adj, and the default setting of 100 is incorrect for most Oracle systems.  For OLTP systems, resetting the parameter to a smaller value (between 10 and 30) may result in huge performance gains as SQL statements change from large-table full-table scans to index range scans.”

http://willgreene.net/siebel/Siebel%20Knowledgebase/Performance%20Tuning%20Guide%20for%20Siebel%20on%20Oracle.pdf

“OPTIMIZER_INDEX_COST_ADJ – Controls the access path selection to be more or less index friendly.  Recommended Value = 1

For Oracle 9i CBO setting OPTIMIZER_INDEX_COST_ADJ = 1 is imperative. This will set the optimizer goal for best response time (versus best throughput). Incorrect setting may cause the optimizer to favor full-table scans instead of index access.
• For Oracle 10g/11g CBO setting OPTIMIZER_INDEX_COST_ADJ = 1 is also recommended, although the default setting of 100 will deliver good results in most cases. It is important to understand that in-house tuning of Siebel CRM application was performed with OPTIMIZER_INDEX_COST_ADJ = 1 setting. This means that customers who want to implement OPTIMIZER_INDEX_COST_ADJ = 100 on Oracle 10g/11g will need to allocate extra development time for additional tuning that may be necessary.
Under no circumstances OPTIMIZER_INDEX_COST_ADJ parameter should be set to values other than 1 (Siebel recommended) or 100 (Oracle default on 10g/11g).”

 Below is a link to a test case, which shows that even when retrieving 0.06% of the rows from a 100 million row table it is potentially faster when a full table scan is used, rather than an index range scan (the index clustering factor was very high):
http://groups.google.com/group/comp.databases.oracle.server/browse_thread/thread/6c436cee329326ec

Before experimenting with this parameter take a look at the following:
http://books.google.com/books?id=b3DIkYO2gBQC&pg=PA185#v=onepage&q=&f=false

http://richardfoote.wordpress.com/2009/07/08/the-cbo-and-indexes-optimizer_index_cost_adj-part-i/

http://richardfoote.wordpress.com/2009/07/22/the-cbo-and-indexes-optimizer_index_cost_adj-part-ii/

http://richardfoote.wordpress.com/2009/08/20/the-cbo-and-indexes-optimizer_index_cost_adj-part-iii/

http://forums.oracle.com/forums/thread.jspa?messageID=3917291 (Joze Senegacnik)

“The parameter optimizer_index_caching is considered by CBO for IN list and NESTED LOOP operations and should be set. The optimizer_index_cost_adj could be very dangerous unless you really know what you are doing, especially when system statistics is used. It was introduced in 8i, but in later releases one should use system statistics to tell CBO what is the timing difference between single block and multiple block I/O. Jonathan Lewis has written about this on his site.

What I really hate is that someone says: set this parameter to a certain value without considering what kind of system is that and what is even more important: without considering what are the consequences of such setting. It is like taking a wrong medicine.

My personal opinion regarding optimizer_index_cost_adj is that one should leave it at 100 (default) and if you really understand the mechanism behind then you may experiment with it. Personally I would use it only at statement level by using OPT_PARAM hint if this would be really necessary. This way you don’t make a system wide change.”