True or False – Oracle Sorting

29 03 2010

March 29, 2010 (Updated April 6, 2010 with a test table)

(Forward to the Next Post in the Series)

I recently encountered a discussion thread (dbaforums.org/oracle/index.php?s=eacd9ff86b358b4a14ecd3fd7653a9fd&showtopic=19407) that pointed to a news article about the internals of Oracle sorting.  The news article (dba-oracle.com/t_oracle_sorting.htm) has a date of October 15, 2007, so it is probably reasonable to assume that the article describes the behavior of Oracle Database 10g R2, and possibly Oracle Database 11g R1.

Please read the news article and see if you are able to answer the following true or false questions.  State why you believe that the question is true, or why you believe that the question is false.  Any answers that attempt to utilize logical fallacies will be scored as incorrect.

1. Sequencing of database output first started in the 1960s.

2. One of the very important components of Oracle tuning is Oracle sorting, yet that process is often overlooked.

3. SSD is a synonym for a super-fast RAM disk.

4. Oracle Database always performs an automatic sorting operation when a GROUP BY clause is used in a SQL statement, when an index is created, and when an ORDER BY clause is used in a SQL statement.

5. The cheapest method is always used by Oracle Database when ordering a resultset.

6. A hinted execution plan involving a single table, with a /*+ index */ hint, will always retrieve the rows in the sorted order of the index.

7. If a SQL statement requires a single sort operation that completes in memory, that SQL statement will not use any space in the TEMP tablespace when the rows are retrieved – with the assumption that a hash join did not spill to disk.

8. The CPU_COST parameter causes Oracle Database to favor the pre-sorted ordering of an index over a discrete sorting operation.

9. The value of the SORT_AREA_SIZE parameter or the PGA_AGGREGATE_TARGET parameter if used, influences Oracle Database’s decision to prefer the pre-sorted ordering of an index over a discrete sorting operation.

10. The clustering factor of an index influences Oracle Database’s decision to prefer the pre-sorted ordering of an index over a discrete sorting operation.

11. The default database block size in use by the database influences Oracle Database’s decision to prefer the pre-sorted ordering of an index over a discrete sorting operation.

12. A sort operation will only spill to disk when RAM is exhausted.

13. “At the time a session is established with Oracle, a private sort area is allocated in memory for use by the session for sorting, based on the value of the sort_area_size initialization parameter.”  Supporting evidence:
http://books.google.com/books?id=gsFC1D1LmvQC&pg=PA306&lpg=PA306#v=onepage&q=&f=false
http://www.articles.freemegazone.com/oracle-sorting.php
oracle-training.cc/oracle_tips_sort_operations.htm

14. For sort intensive tasks it is not possible to adjust the amount of memory allocated to those tasks by adjusting the SORT_AREA_SIZE parameter at the session level.

15. The entire database can be slowed down due to a disk sort in the TEMP tablespace because sorts to disk are I/O intensive.

16. A good general rule is that the SORT_AREA_SIZE parameter should be adjusted to eliminate sorts to disk caused by GROUP BY operations.

17. Buffer pool blocks are allocated to hold or manage the blocks that are in the TEMP tablespace.

18. An optimal workarea execution, completed entirely in memory, is always preferred over a one-pass or multi-pass workarea execution.

19. Free buffer waits can be caused by excessive sorts to disk, which cause data blocks needed by other sessions to be paged out of the buffer.

20. One percent is an acceptable ratio of disk sorts to the total number of sorts.

21. When the PGA_AGGREGATE_TARGET parameter is specified, the total work area size cannot exceed 200MB.

22. No task may use more than 10MB for sorting.

23. A DBA should modify two hidden (underscore) parameters to permit up to 50MB of memory to be used for an in-memory sort operation for a SQL statement.

Have you ever read an article in an attempt to find the answer to a very specific question, only to find that by the time the end of the article is reached, you now have a whole new set of questions?  Try to answer the above questions using something beyond true or false – tell me why in detail it is true or why it is false.  Are there any other questions that could be asked about the article?

——-

Test table for question #10, added April 6, 2010:

CREATE TABLE T1 (
  C1 NUMBER,
  C2 NUMBER,
  C3 VARCHAR2(100),
  PRIMARY KEY (C1));

INSERT INTO T1
SELECT
  ROWNUM,
  DECODE(MOD(ROWNUM,100),0,NULL,ROWNUM),
  RPAD('A',100,'A')
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

COMMIT;

ALTER TABLE T1 MODIFY (C2 NOT NULL);

CREATE INDEX IND_T1_C2 ON T1(C2);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE)

Actions

Information

6 responses

29 03 2010
Charles Hooper

I hear someone in the back shouting: “Mr. Hooper, these questions are impossibly difficult to answer correctly.” I normally dislike true/false questions, but these might be interesting questions to ask during an interview.

To get you started:
1. False. Why? When was the first version of Oracle Database (version 2.0 if I recall correctly) released, in the 1960s? No, so the question might be describing a different type of database. That raises the question, just what is a database? Define:database
“•an organized body of related information”
“•A database is an integrated collection of logically related records or files consolidated into a common pool that provides data for one or more…”
“•A collection of (usually) organized information in a regular structure, usually but not necessarily in a machine-readable format accessible by a …”

So, if the question is not necessarily describing an Oracle Database, could the question possibly be referring to a database of some kind that was invented before 1960 and in the 1960s someone finally decided, wow, random order is useless – we really should put this census (or some other information) into some sort of sequence.

OK, that was the easy question.

30 03 2010
Chris_c

I’m not sure if the intention os to answer based only on the information in the article but I’ll give a couple of these a shot..

2: True
But so are dozens (hundreds) of other things, sorting can cause problems when you sort when you don’t need to or you sort then throw most of the data away after sorting it, but the same can be said about every area of tunning.

3: False
Solid State Disk is a persistant storage mechanism, once data is written to the disk it stays there until deleted or overwritten just like those round spinning things us mortals use. RAM Disk is based on volatile memory either in dedicated devices or an area carved out of the main system RAM I first came accross this when using a 48k ZX Spectrum, when you power off any data in a RAM disk is lost.

4: False
Hash group by — Grouping not using a sort
create index ….. nosort — index created not using a sort
Order by – not sure on this one I suspect there are cases where the sort operation is not required i.e. retrieving data from an IOT ordered by the primary key or form an index scan where only the indexed colums are required but i’d need to check.

5: False
The plan calculated as having the cheapest overall cost will be used this may include a more expensive sort operation than an alternate plan if the overall cost is calculated a lower also the calculated cost can be a poor match to the real cost for a numbe rof reasons e.g. bad stats, complicated query joins/bad querys, dodgy hints or good old fashioned bugs.

6: False
Oracle documentation states that the only way to guarantee the order of a resultset is to use and order by clause using and index hint may or may not give you the results in order but it is never guaranteed and the behaviour may change between versions or patchsets.

Thats all for now, better go do some real work.

30 03 2010
Charles Hooper

Chris,

Thanks for participating. The intention here is to provide accurate answers to the questions, in an effort to limit confusion.

Nice justifications – you left enough room for others to extend your answers should they decide to participate (with your answer for #3 as a possible exception – question #3 had no chance of being correct).

1 04 2010
Log Buffer #184, a Carnival of the Vanities for DBAs | The Pythian Blog

[…] gear: Charles Hooper posted a 3-part series with seemingly innocent True/False questions. He covers sorting, SQL tuning and wait […]

6 04 2010
Charles Hooper

#6 False, as indicated by Chris. It could be that the index hint is invalid, and thus cannot be used. It could be that an index fast full scan would be performed, rather than an index range scan or full scan. It could be that the hint was included in an inline view, with an explicit sort order outside of the inline view. A “single table” does not necessarily mean that a self-join was not used, which might imply a much more complex SQL statement. The question even leaves open the possibility of an explicit ORDER BY clause that forces a different sort order. See the answer provided by Chris.

#7 False. If the SORT_AREA_SIZE and SORT_AREA_RETAINED_SIZE parameters are in use (for example, if the WORKAREA_SIZE_POLICY parameter is set to FALSE at the session level) and the SORT_AREA_RETAINED_SIZE parameter is set to a smaller value than SORT_AREA_SIZE, a sort could complete entirely in memory, and then the results need to be spooled to the TEMP tablespace before being returned to the client. There is a much better description in the book “Cost Based Oracle Fundamentals”.

#8 False. The CPU_COST parameter only exists to cause confusion. To explain, a quote from this source “When data using the new cpu_cost parameter in Oracle10g, the Oracle SQL optimizer builds the SQL plan decision tree based on the execution plan that will have the lowest estimated CPU cost.” Sounds logical, or sounds confusing?

A Google search found this AskTom thread “Well, two things. First, you don’t switch on cpu costing in 10g, it is just there and you have to poke around for a hidden parameter to turn it off (or use a hint “no_cpu_costing” to disable it for a single query). Next there is no such parameter as ‘cpu_cost’, there is a computed cpu_cost — but that is not driven by the number of CPU’s but rather how fast they are reported to be (gathered via system stats).” Sounds logical, yet it conflicts with the previous quote.

If you use my VBS script to retrieve the hidden parameters, you will find that there are no parameters with a name that is even close to CPU_COST, hidden or not. This alone is enough to state that the CPU_COST parameter does NOT cause Oracle Database to favor the pre-sorted ordering of an index over a discrete sorting operation.

Tricky questions, but someone must ask them…

6 04 2010
Charles Hooper

#10 Maybe, maybe not. Here is an extreme example using the test table T1 at the end of the set of questions:

Let’s check the current statistics for the index on column C2:

SELECT
  INDEX_NAME,
  CLUSTERING_FACTOR,
  NUM_ROWS
FROM
  DBA_INDEXES
WHERE
  OWNER=USER
  AND TABLE_NAME='T1';
 
INDEX_NAME   CLUSTERING_FACTOR   NUM_ROWS
------------ ----------------- ----------
SYS_C0020582               159      10000
IND_T1_C2                  159      10000

A very good, very low value for the clustering factor. So, now we need a test:

SET AUTOTRACE TRACEONLY EXPLAIN
 
SELECT
  *
FROM
  T1
ORDER BY
  C2;
 
Plan hash value: 4220775576
 
-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           | 10000 |  1054K|   182   (1)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        | 10000 |  1054K|   182   (1)| 00:00:01 |
|   2 |   INDEX FULL SCAN           | IND_T1_C2 | 10000 |       |    22   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

An index full scan operation, with no sorting operation. Let’s set the clustering factor to the same value as the number of rows in the table, the maximum value:

EXEC DBMS_STATS.SET_INDEX_STATS(OWNNAME=>USER,INDNAME=>'IND_T1_C2',CLSTFCT=>10000,NO_INVALIDATE=>FALSE)

Now we are able to repeat the test:

SELECT
  *
FROM
  T1
ORDER BY
  C2;
 
Plan hash value: 2148421099
 
-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |  1054K|       |   217   (2)| 00:00:02 |
|   1 |  SORT ORDER BY     |      | 10000 |  1054K|  2376K|   217   (2)| 00:00:02 |
|   2 |   TABLE ACCESS FULL| T1   | 10000 |  1054K|       |    11  (10)| 00:00:01 |
-----------------------------------------------------------------------------------

The execution plan changed, but why? Unfortunately, the article did not say why the execution plan would change, only that the clustering factor influences the Oracle Database’s decision to prefer the pre-sorted order if the clustering factor decreases – but the article did not state that the pre-sorted order would be preferred if the clustering factor increases.

But, why does it happen? Can someone give me a hint?

SELECT /*+ INDEX(T1 IND_T1_C2) */
  *
FROM
  T1
ORDER BY
  C2;
 
Plan hash value: 4220775576
 
-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           | 10000 |  1054K| 10034   (1)| 00:00:48 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        | 10000 |  1054K| 10034   (1)| 00:00:48 |
|   2 |   INDEX FULL SCAN           | IND_T1_C2 | 10000 |       |    22   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Increasing the clustering factor increased the cost of retrieving the rows through the index access path, and that is why Oracle’s optimizer elected to perform a full table scan, rather than an index full scan. So, it is not purely the CLUSTERING_FACTOR that caused the change, but rather the CLUSTERING_FACTOR’s influence on the cost of the access path that caused the change in the execution plan.

Leave a comment