DB_BLOCK_SIZE and DB_FILE_MULTIBLOCK_READ_COUNT 3 – What is Wrong with this Quote?

4 12 2010

December 4, 2010

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

I recently reviewed the book “Oracle Tuning the Definitive Reference Second Edition”, and did not provide an in-depth technical review of the entire book.  As I stated in a comment in the earlier blog article, I would like to clarify that my review on the Amazon site is not intended to suggest that people should not buy the book. As the publisher’s website states that the book is written for senior Oracle DBAs, I suggest that senior DBAs, Oracle Certified Masters, and other people who are able to research the book’s contents *should* buy the book and post reviews of the book that highlight specific content of the book indicating whether or not that content is correctly stated. Such feedback will be a benefit to the Oracle community, and could help to improve Oracle Database books that are published in the future. I think that we need to keep a positive outlook in order to see things improve

With that said, what is wrong with the following quote from page 719 of the book (please ignore the error in the SQL statement found on that page for the moment)?

“Four factors synchronize to help the CBO choose whether or use an index or a full-table scan:

7.  The selectivity of a column value
8.  The db_block_size
9.  The avg_row_len
10. The cardinality

An index scan is usually faster if a data column has high selectivity and a low clustering_factor as shown in Figure 15.8.”

What, if anything, is wrong with the above quote from the book?

The point of blog articles like this one is not to insult authors who have spent thousands of hours carefully constructing an accurate and helpful book, but instead to suggest that readers investigate when something stated does not exactly match what one believes to be true.  It could be that the author “took a long walk down a short pier”, or that the author is revealing accurate information which simply cannot be found through other resources (and may in the process be directly contradicting information sources you have used in the past).  If you do not investigate in such cases, you may lose an important opportunity to learn something that could prove to be extremely valuable.

A Google book search indicates that a slightly modified version of the statement appeared in a couple of other books:

Other pages found during a Google search of the phrase:

—————–

For some reason, the following quote seems to come to mind:

“Fool me once, shame on me.  Fool me twice, shame on you.” (reference)


Actions

Information

4 responses

4 12 2010
Jonathan Lewis


Just the place for a Snark ! I have said it twice:
That alone should encourage the crew.
Just the place for a Snark ! I have said it thrice
What I tell you three times is true.

Lewis Carroll – The hunting of the Snark

The first thought that springs to mind is that when listing 4 points you don’t usually start at number 7.

Second is that the clustering_factor doesn’t make it to the list (although it is mentioned immediately afterwards – which also makes me wonder whether the author read the list before inserting it).

If the author thinks the “db_block_size” is one of “the four factors”, you have to wonder why the db_file_multiblock_read_count is not, why the values for the MBRC, mreadtim, sreadtim system statistics are not, and why the author does not believe that the block sizes of the tablespaces holding the table and its indexes are not factors (the author is, I believe a strong proponent of putting indexes into tablespaces with the largest possible block size).

I hope the author includes a detailed discussion of why the db_block_size affects the optimizer’s choice. There is no direct relationship between cost and db_block_size, there are only side effects, and it is far from trivial to understand these side effects.

Like db_block_size, the avg_row_len is not a factor that the optimizer uses when making the choice between a tablescan and an indexed access path. There are, of course, (subtle) side effects (again) – e.g. longer rows means fewer rows per block which affects the probability of two adjacent key values pointing to two different blocks (hence an increase in the clustering_factor, hence the cost of using the index); then again, longer rows means fewer rows per block, which means more table blocks which increases the cost of the tablescan ! (We might also note that the average index key length affects the choice – which also means the level of prefix compression affects the choice, since both will affect the number of leaf blocks in the index, and the leaf block count really IS a factor influencing the cost of using the index.)

The “selectivity of a column” – what about multi-column indexes ?

The “cardinality” is derived from the selectivity as an estimate of the number of output rows – it isn’t an input to the cost, so doesn’t affect the choice of tablescan or index. If you wanted an explanation for a novice you might couch it in terms of cardinality – but that’s like talking about “the Sun rising” rather than “the Earth rotating”.

“High” selectivity – I think we could do with the author’s definition of selectivity here. For the purposes of the optimizer, “selectivity” is a number between 0 and 1 and, all other things being equal, the high the selectivity for an indexed access patt the less likely the optimizer is to choose it.

The text switches from the cost to the run time without making the point that the cost is an estimate of the run time and the run time could be completely different. It is (for example) easy to have a very high clustering_factor and still have a very fast run time because of the way that Oracle generates the clustering_factor.

The trouble with getting things right is that it takes so much longer, and so much more effort than just putting a few vaguely relevant words in roughly the right order.

5 12 2010
Charles Hooper

Jonathan,

Thank you for the extensive explanation, it is very helpful. This quote from th book might be just one example that shows why it is a bad idea to recycle material that was published in 2001 – the errors and problems caused by the missing details are amplified nine years later due to the new features added to Oracle Database over that time.

In your comment, you stated ” There is no direct relationship between cost and db_block_size, there are only side effects, and it is far from trivial to understand these side effects.” And ” I believe a strong proponent of putting indexes into tablespaces with the largest possible block size.” The book is also a strong proponent of replacing most full table scans with some sort of index access path. There is an interesting tie-in with Randolf Geist’s System Statistics presentation, which includes a section (starting at slide 59) that discusses the effects of non-uniform block sizes on the Oracle optimizer’s costing of access paths. Randolf’s test case seems to show a significant drop in the optimizer’s costing calculations for the objects in the larger block size – dropping the cost number without improving the execution time. Randolf’s test case used a 10,000 block example with 1 row per block in a locally managed tablespace using manual segment space management and a default 8KB block size (DB_BLOCK_SIZE). His test demonstrated what happens to the execution plan costs when the cost model changes between the various cost models:
Traditional I/O Costing:
2KB block tablespace: 2441
8KB block tablespace: 1517
16KB block tablespace: 1199

NOWORKLOAD System Statistics:
2KB block tablespace: 7708
8KB block tablespace: 2708
16KB block tablespace: 1875

WORKLOAD System Statistics:
2KB block tablespace: 10833
8KB block tablespace: 2708
16KB block tablespace: 1354

Quite clearly from the above, if someone is convinced to tune by cost (I do not recall the book suggesting that), one would create a database with a 2KB default tablespace, create a 32KB tablespace, and then create all objects in the 32KB tablespace. It might be worth experimentation to see what happens to the costing of index access paths when the indexes are recreated in a 32KB or 16KB tablespace with the table data still residing in the 8KB (or 2KB) default tablespace – would this change automatically tune the whole database to use indexes rather than full tablescans? I see the start of a silver bullet.

6 12 2010
7 12 2010
Charles Hooper

Centinul,

Thank you for the links – those will be very helpful for the sections of the book that offer advice about index rebuilds. It might be interesting to read Richard Foote’s review of the book, if he bought a copy.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Follow

Get every new post delivered to your Inbox.

Join 142 other followers

%d bloggers like this: