February 22, 2010 (Updated March 7, 2010, August 10, 2010)
As mentioned in Chapter 8 of the “Expert Oracle Practices” book, there are several ways to measure the amount of CPU time used, whether at the operating system level, at the instance-wide level, at the session level, or at the SQL statement level. In a previous blog article I described why driving the database server’s CPUs to 100% utilization was a bad idea and provided quotes from several articles that disagreed as well as several that agreed with my statements (Cary Millsap’s follow-up comment should be read carefully).
I mentioned in the previous blog article that some CPUs will slow their core frequences if thermal thresholds are reached, which might be possible by pegging the CPU at 100% utilization for long periods of time. Intel offers the Intel Processor Identification Utility for the Windows platform (similar utilities for other platforms and non-Intel processors probably also exist) that shows the rated processor speed and the current CPU speed for each CPU/core in the server. For example, the following is the output from the Intel program when run on my three year old laptop:
Here is the same laptop, captured in a different time period. This particular slow down was not caused by a thermal threshold problem, rather it was created by setting the Windows power plan to Power Saver:
In the above, notice that the reported core speed for both CPU cores is 50% of the expected CPU speed, and even the system bus speed dropped slightly (I have seen lower CPU speeds and system bus speeds when the computer wakes from hibernation mode). (Edit: See the pictures at the end of the article for examples of slower speeds 8+ hours after coming out of hibernation mode.)
—
Let’s jump into Oracle now (we will use Oracle Database 11.2.0.1 and 11.1.0.6). There are several statistics in the database views that report CPU utilization. One such statistic is CPU used by this session. This is a potentially challenging statistic to use. From the documentation:
CPU used by this session: Amount of CPU time (in 10s of milliseconds) used by a session from the time a user call starts until it ends. If a user call completes within 10 milliseconds, the start and end user-call time are the same for purposes of this statistics, and 0 milliseconds are added. A similar problem can exist in the reporting by the operating system, especially on systems that suffer from many context switches.
CPU used when call started: The CPU time used when the call is started
If we try to understand the above, the CPU used by this session statistic is the time, in centiseconds, of CPU utilization while a user call is running on the server’s CPUs. The second statistic, CPU used when call started, indicates the value of the CPU used by this session statistic when the user call was started. The documentation leaves a couple of questions unanswered:
- When are these statistics updated? If the CPU used by this session statistic is updated in real-time (if you are wondering, it is Not), the CPU used when call started statistic could be used to determine how long the session has been on the CPU (and not in a wait event and not in one of the other CPU run states), but if the statistics are not updated in real time, what is the point of having two separate statistics?
- What is considered a user call – is it the full execution of a user’s SQL statement, or is it a fetch of a block from disk, or is it the completion of a fetch of some of the result rows, or is a user call something entirely different?
We need a test case to see what is happening with these statistics, and several of the other statistics that measure CPU utilization (warning, this test table may take an hour or longer to create).
CREATE TABLE T1 ( ID NUMBER, DESCRIPTION VARCHAR2(80)); INSERT INTO T1 SELECT CEIL(ABS(SIN(ROWNUM/9.9999)*10000)), 'This is the long description for this number '|| TO_CHAR(CEIL(ABS(SIN(ROWNUM/9.9999)*10000))) FROM (SELECT ROWNUM RN FROM DUAL CONNECT BY LEVEL<=10000), (SELECT ROWNUM RN FROM DUAL CONNECT BY LEVEL<=10000); COMMIT; CREATE INDEX IND_T1 ON T1(ID); EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE, METHOD_OPT=>'FOR ALL COLUMNS SIZE 1')
Once the test table is created, disconnect from the database and then reconnect to the database to reset the session-level statistic values. Note that this test is performed with the FILESYSTEMIO_OPTIONS parameter set to SETALL to enable direct, asychronous I/O. Next let’s determine the SID for the session:
SELECT SID FROM V$MYSTAT WHERE ROWNUM=1; SID ---- 552
Start up a second SQL*Plus connection to the database, and prepare to execute the following script (documentation references: V$OSSTAT, V$SESS_TIME_MODEL, V$SESSTAT):
SPOOL CPU_STATS.TXT SET PAGESIZE 2000 SET ARRAYSIZE 1000 SELECT STAT_NAME, VALUE FROM V$OSSTAT WHERE STAT_NAME IN ('SYS_TIME','USER_TIME','BUSY_TIME','IDLE_TIME','IOWAIT_TIME','NICE_TIME','RSRC_MGR_CPU_WAIT_TIME') ORDER BY STAT_NAME; SELECT STAT_NAME, VALUE FROM V$SESS_TIME_MODEL WHERE STAT_NAME='DB CPU' AND SID=552 ORDER BY STAT_NAME; SELECT SN.NAME, SS.VALUE FROM V$SESSTAT SS, V$STATNAME SN WHERE SS.SID=552 AND SS.STATISTIC#=SN.STATISTIC# AND (SN.NAME LIKE 'CPU%' OR SN.NAME LIKE '%cpu%' OR SN.NAME LIKE '%consistent%') ORDER BY SN.NAME; host sleep 60 SELECT STAT_NAME, VALUE FROM V$OSSTAT WHERE STAT_NAME IN ('SYS_TIME','USER_TIME','BUSY_TIME','IDLE_TIME','IOWAIT_TIME','NICE_TIME','RSRC_MGR_CPU_WAIT_TIME') ORDER BY STAT_NAME; SELECT STAT_NAME, VALUE FROM V$SESS_TIME_MODEL WHERE STAT_NAME='DB CPU' AND SID=552 ORDER BY STAT_NAME; SELECT SN.NAME, SS.VALUE FROM V$SESSTAT SS, V$STATNAME SN WHERE SS.SID=552 AND SS.STATISTIC#=SN.STATISTIC# AND (SN.NAME LIKE 'CPU%' OR SN.NAME LIKE '%cpu%' OR SN.NAME LIKE '%consistent%') ORDER BY SN.NAME; host sleep 60 ... SPOOL OFF
In the above script, repeat the three SQL statements (replacing 552 with the SID number returned by your query) between sets of host sleep 60 calls as frequently as required, possibly 30 times, so that the full run of the SQL statement that will be executed by the first session is captured (see page 213 of the “Expert Oracle Practices” book for a Windows equivalent of the Unix/Linux sleep command) .
In the first session, prepare to execute the following script:
SET AUTOTRACE TRACEONLY STATISTICS SET ARRAYSIZE 1000 SELECT /*+ INDEX(T1) */ ID, DESCRIPTION FROM T1 WHERE ID BETWEEN 1 AND 20;
The index hint is added in the above just to make certain that the IND_T1 index is used when we select 1.27% of the rows from the test table. Start the monitoring script in the second session and immediately start the above script in the first session.
When the first session completes, it should output something like the following (from the 11.2.0.1 test):
127325 rows selected. Statistics --------------------------------------------------- 1 recursive calls 0 db block gets 127794 consistent gets 121986 physical reads 0 redo size 7084492 bytes sent via SQL*Net to client 1757 bytes received via SQL*Net from client 129 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 127325 rows processed
Or the output might look something like the following (from the 11.1.0.6 test):
127325 rows selected. Statistics --------------------------------------------------- 154 recursive calls 0 db block gets 127727 consistent gets 129179 physical reads 0 redo size 7084492 bytes sent via SQL*Net to client 1757 bytes received via SQL*Net from client 129 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 127325 rows processed
If we take the spool file output that was generated by the second session and calculate the delta values for the statistics, we find the following statistic deltas were captured during each 60 second time period for the 11.2.0.1 test run:
STAT_NAME 1 2 3 4 5 6 7 8 9 10 ------------------------------------------------------ -------- -------- -------- -------- -------- -------- -------- -------- -------- -------- IDLE_TIME 48,854 48,753 48,672 48,645 47,920 47,666 48,425 48,652 48,648 48,664 BUSY_TIME 151 147 155 160 896 1,158 426 172 177 162 USER_TIME 29 21 29 32 731 1,001 285 43 48 53 SYS_TIME 55 55 57 54 96 91 69 58 57 49 IOWAIT_TIME 5,825 6,002 5,980 5,978 6,317 6,394 6,087 6,017 6,007 4,750 NICE_TIME 0 0 0 0 0 0 0 0 0 0 RSRC_MGR_CPU_WAIT_TIME 0 0 0 0 0 0 0 0 0 0 DB CPU 483,927 485,925 503,924 525,920 471,928 357,946 443,932 462,930 460,930 369,943 CPU used when call started 48 45 52 51 48 37 44 47 45 35 CPU used by this session 48 45 52 51 48 37 44 47 45 38 parse time cpu 0 0 0 0 0 0 0 0 0 0 recursive cpu usage 0 0 0 0 0 0 0 0 0 0 consistent gets 13,379 13,659 13,767 14,343 11,657 9,834 12,945 13,666 13,714 10,830 consistent gets from cache 13,379 13,659 13,767 14,343 11,657 9,834 12,945 13,666 13,714 10,830 no work - consistent read gets 13,377 13,659 13,767 14,343 11,657 9,834 12,945 13,666 13,714 10,830 consistent gets from cache (fastpath) 692 602 622 707 448 442 574 587 569 564 consistent gets - examination 2 0 0 0 0 0 0 0 0 0 consistent gets direct 0 0 0 0 0 0 0 0 0 0 consistent changes 0 0 0 0 0 0 0 0 0 0 transaction tables consistent reads - undo records applied 0 0 0 0 0 0 0 0 0 0 transaction tables consistent read rollbacks 0 0 0 0 0 0 0 0 0 0 data blocks consistent reads - undo records applied 0 0 0 0 0 0 0 0 0 0 cleanouts only - consistent read gets 0 0 0 0 0 0 0 0 0 0 rollbacks only - consistent read gets 0 0 0 0 0 0 0 0 0 0 cleanouts and rollbacks - consistent read gets 0 0 0 0 0 0 0 0 0 0
The 11.1.0.6 test run (executed after the 11.2.0.1 run completed without shutting down the 11.2.0.1 instance that was running on the same server) showed the following statistics:
STAT_NAME 1 2 3 4 ------------------------------------------------------ -------- -------- -------- -------- IDLE_TIME 48,470 48,536 48,547 48,568 BUSY_TIME 593 293 274 236 USER_TIME 345 47 42 35 SYS_TIME 117 87 83 59 IOWAIT_TIME 6,394 5,878 5,887 5,460 NICE_TIME 0 0 0 0 RSRC_MGR_CPU_WAIT_TIME 0 0 0 0 DB CPU 669,899 862,868 845,872 671,898 CPU used when call started 68 86 82 67 CPU used by this session 68 86 82 68 parse time cpu 2 0 0 0 recursive cpu usage 1 0 0 0 consistent gets 24,985 35,023 35,024 32,699 consistent gets from cache 24,985 35,023 35,024 32,699 no work - consistent read gets 24,972 35,023 35,024 32,699 consistent gets from cache (fastpath) 24,296 34,076 34,067 31,802 consistent gets - examination 13 0 0 0 consistent gets direct 0 0 0 0 consistent changes 0 0 0 0 transaction tables consistent reads - undo records applied 0 0 0 0 transaction tables consistent read rollbacks 0 0 0 0 data blocks consistent reads - undo records applied 0 0 0 0 cleanouts only - consistent read gets 0 0 0 0 rollbacks only - consistent read gets 0 0 0 0 cleanouts and rollbacks - consistent read gets 0 0 0 0
As evidenced by the above, the SQL statement completed in just less than 10 minutes in Oracle 11.2.0.1, while the SQL statement in 11.1.0.6 completed in just less than 4 minutes. Why the significant time difference? Maybe the above stats provide a clue?
The IDLE_TIME and BUSY_TIME statistics are server-wide with each of the server’s CPUs accumulating one second for every second of elapsed time, with the one second split between these two statistics. USER_TIME is a server-wide statistic that indicates how many CPU centiseconds were spent executing application/Oracle level activity. SYS_TIME is a server-wide statistic that indicates how many CPU centiseconds were spent executing operating system code (in kernel mode), typically on behalf of another program that usually accumulates CPU consumption time in the USER_TIME statistic. In the above, DB CPU is accumulated at the session level and must be divided by 1,000,000 to convert the values to seconds. The CPU used when call started and CPU used by this session statistics are at the session level and are reported in centiseconds. If we look at the first time period for the Oracle 11.1.0.6 execution, we are able to derive the following statistics:
- (48,470 idle centiseconds + 593 busy centiseconds) / (8 CPUs) = 6132.875 centiseconds = 61.33 seconds in the time period
- (593 / 100) = CPUs were used for 5.93 seconds of a total possible (61.33 * 8 ) CPU seconds in the time period
- (593 busy centiseconds) / (48,470 idle centiseconds + 593 busy centiseconds) * 100 = CPUs were 1.21% busy
- (345 user mode centiseconds) / (345 user mode centiseconds + 117 kernel mode centiseconds) * 100 = 74.68% of the CPU time was consumed in user mode
- (669,899 / 1,000,000) = Monitored session consumed 0.6699 seconds of CPU time in the 61.33 second time period
- (669,899 / 1,000,000) / (345 / 100) * 100 = Monitored session consumed 19.41% of the user mode CPU time on the server
- (68 / 100) = Session of interest consumed 0.68 seconds of CPU time in the 61.33 second time period
- (2 / 100) = 0.02 seconds session spent parsing the SQL statement
- (1 / 100) = 0.01 seconds session spent performing recursive calls to optimize the SQL statement
- (24,296 / 24,985) * 100 = 97.24% of the consistent gets were performed by fastpath (the statistic definition is apparently not documented – I suspect that it might mean that the blocks were read by direct path reads, thus bypassing the buffer cache – but I suppose that it might mean instead that multiple single-block read requests were combined into a single parallel read request)
The DB CPU statistic in the V$SESS_TIME_MODEL view and the various statistics in the V$OSSTAT view are updated in near real-time – the monitored session only consumed about 0.6 seconds of CPU time for every 60 seconds of execution time. You will notice that for both Oracle Database 11.2.0.1 and 11.1.0.6 the delta values for the CPU used when call started statistic and the CPU used by this session statistic appear as something other than the number 0 throughout the execution time, so the statistics must be updated at least once a minute in this test run, but that appears to contradict the results shown in Table 8-1 on page 201 of the “Expert Oracle Practices” book where those statistics were only updated 5 minutes and 35 seconds after a SQL statement was submitted (in the book, the CPU used by this session statistic was updated to show only the parse CPU time, until 5 minutes and 35 seconds elapsed). Is the book wrong, or are the above statistic deltas wrong, or maybe there is another explanation?
Fortunately, while the SQL statement executed, I also captured several statistic delta values from V$SQL for the SQL statement using my Toy Project for Performance Tuning, but a script like that found on page 216 of the “Expert Oracle Practices” book will also collect similar statistics. The captured deltas for the SQL statement on Oracle 11.2.0.1 are as follows (captured roughly every 60 seconds):
CPU Time Elapsed Time Parse Calls Executions Fetches Users Opening Users Executing Invalidations Loads Disk Reads Buffer Gets 0.47 56.84 1 1 13 1 1 0 1 12,148 12,840 0.50 63.62 0 0 15 0 1 0 0 13,648 14,251 0.52 64.14 0 0 14 0 1 0 0 13,801 14,514 0.54 61.81 0 0 15 0 1 0 0 13,922 14,572 0.45 61.15 0 0 11 0 1 0 0 10,598 11,060 0.44 69.25 0 0 12 0 1 0 0 11,559 12,036 0.47 64.16 0 0 14 0 1 0 0 13,318 14,041 0.48 62.28 0 0 14 0 1 0 0 13,439 14,071 0.47 64.97 0 0 14 0 1 0 0 13,823 14,386 0.20 26.24 0 0 7 0 0 0 0 5,730 6,023
There were roughly 13 fetches in the first minute, roughly 15 fetches in the second minute, 14 fetches in the third minute, etc. The statistics in V$SQL are updated at least once every 6 seconds, so the displayed CPU time values (shown in seconds) are reasonably close to actual CPU consumption times. You might recognize that the above CPU Time delta values are very close to the delta values shown for the CPU used by this session statistic, so does that mean that the CPU used by this session statistic is updated every 6 seconds or less? That brings us back to the question, what is considered by the documentation as a “user call” in regard to these two statistics? Given the above evidence, a “user call” in this context actually includes fetch calls.
Because of the inconsistency with which the CPU used by this session statistic is updated (it might not be updated for more than an hour while executing certain SQL statements), it is a better idea to use the DB CPU statistic found in the V$SESS_TIME_MODEL view (available starting with the Oracle 10.1 release) when trying to determine how much CPU time a particular session is consuming. Of course operating system tools such as ps and top or my VBS script for Windows may also be used to determine which sessions are consuming CPU time. The V$SESSION and V$SESSION_WAIT views are also able to help you determine which sessions are currently consuming CPU time and roughly how much CPU time has been consumed since the last wait event ended.
But a parallel question remains, why is a particular session consuming CPU time (the above test case is not necessarily the best example for showing uncontrolled CPU usage, try this test case with the OPTIMIZER_FEATURES_ENABLE parameter set to 10.2.0.4 or lower at the session level)? The V$SESSION_LONGOPS and V$SQL_MONITOR (introduced in Oracle 11.1, requires a Tuning Pack license to access) views might provide enough information to determine why a session is consuming CPU time. A 10046 extended SQL trace could also potentially show why a session is consuming an excessive amount of CPU time, unless, of course, nothing is written to the trace files because the session is performing a CPU intensive operation completely in memory. What then? Multiple ORADEBUG SHORT_STACK traces (there is a small chance that creating one of these dumps will crash the session) for the session with the help of Metalink Doc ID 175982.1 could provide a clue of what is happening in the session. If that does not help find the cause of the CPU usage, you could just declare that the Oracle Wait Interface is Useless (Sometimes) and move along to something more productive (I suggest reviewing that blog series).
Added March 7, 2010:
I thought that the laptop was running a bit sluggish today. It has been in and out of hibernation on a daily basis for the last two weeks. I thought that I would check the CPU speed just for fun – this is what I received on 2 checks of the performance with the Power Plan set to High Performance and with the laptop plugged into the electrical power:
I guess that explains why this darn laptop feels so slow – as I write this update to the blog article I see that the core frequency has again dropped:
Is Intel trying to strong arm me into buying one of those Core i7 920QX laptops, or is Microsoft trying to get me to upgrade to Windows 7 on this laptop?
—————-
Update August 10, 2010:
The CPU speed has dipped to 0.210GHz several times, possibly caused by overheating:
——
I think that I finally have a workable solution:
[…] 2-When does Oracle update CPU usage statistics ? Charles Hooper-CPU Usage Monitoring – What are the Statistics and when are the Statistics Updated? […]
[…] While developing “hog” I benefited & borrowed ideas from these excellent sources: Craig Shallahamer’s OP Load Generator Karl Arao’s cputoolkit Kyle Hailey’s Oracle CPU Time Charles Hooper’s CPU Usage Monitoring – What are the Statistics and when are the Statist… […]