CPU Usage Monitoring – What are the Statistics and when are the Statistics Updated?

22 02 2010

February 22, 2010 (Updated March 7, 2010, August 10, 2010)

As mentioned in Chapter 8 of the “Expert Oracle Practices” book, there are several ways to measure the amount of CPU time used, whether at the operating system level, at the instance-wide level, at the session level, or at the SQL statement level.  In a previous blog article I described why driving the database server’s CPUs to 100% utilization was a bad idea and provided quotes from several articles that disagreed as well as several that agreed with my statements (Cary Millsap’s follow-up comment should be read carefully).

I mentioned in the previous blog article that some CPUs will slow their core frequences if thermal thresholds are reached, which might be possible by pegging the CPU at 100% utilization for long periods of time.  Intel offers the Intel Processor Identification Utility for the Windows platform (similar utilities for other platforms and non-Intel processors probably also exist) that shows the rated processor speed and the current CPU speed for each CPU/core in the server.  For example, the following is the output from the Intel program when run on my three year old laptop:

Here is the same laptop, captured in a different time period.  This particular slow down was not caused by a thermal threshold problem, rather it was created by setting the Windows power plan to Power Saver:

In the above, notice that the reported core speed for both CPU cores is 50% of the expected CPU speed, and even the system bus speed dropped slightly (I have seen lower CPU speeds and system bus speeds when the computer wakes from hibernation mode).  (Edit: See the pictures at the end of the article for examples of slower speeds 8+ hours after coming out of hibernation mode.)

Let’s jump into Oracle now (we will use Oracle Database 11.2.0.1 and 11.1.0.6).  There are several statistics in the database views that report CPU utilization.  One such statistic is CPU used by this session.  This is a potentially challenging statistic to use.  From the documentation:

CPU used by this session: Amount of CPU time (in 10s of milliseconds) used by a session from the time a user call starts until it ends. If a user call completes within 10 milliseconds, the start and end user-call time are the same for purposes of this statistics, and 0 milliseconds are added. A similar problem can exist in the reporting by the operating system, especially on systems that suffer from many context switches.

CPU used when call started: The CPU time used when the call is started

If we try to understand the above, the CPU used by this session statistic is the time, in centiseconds, of CPU utilization while a user call is running on the server’s CPUs.  The second statistic, CPU used when call started, indicates the value of the CPU used by this session statistic when the user call was started.  The documentation leaves a couple of questions unanswered:

  • When are these statistics updated?  If the CPU used by this session statistic is updated in real-time (if you are wondering, it is Not), the CPU used when call started statistic could be used to determine how long the session has been on the CPU (and not in a wait event and not in one of the other CPU run states), but if the statistics are not updated in real time, what is the point of having two separate statistics?
  • What is considered a user call – is it the full execution of a user’s SQL statement, or is it a fetch of a block from disk, or is it the completion of a fetch of some of the result rows, or is a user call something entirely different?

We need a test case to see what is happening with these statistics, and several of the other statistics that measure CPU utilization (warning, this test table may take an hour or longer to create).

CREATE TABLE T1 (
  ID NUMBER,
  DESCRIPTION VARCHAR2(80));

INSERT INTO T1
SELECT
  CEIL(ABS(SIN(ROWNUM/9.9999)*10000)),
  'This is the long description for this number '|| TO_CHAR(CEIL(ABS(SIN(ROWNUM/9.9999)*10000)))
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000),
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000);

COMMIT;

CREATE INDEX IND_T1 ON T1(ID);

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE, METHOD_OPT=>'FOR ALL COLUMNS SIZE 1')

Once the test table is created, disconnect from the database and then reconnect to the database to reset the session-level statistic values.  Note that this test is performed with the FILESYSTEMIO_OPTIONS parameter set to SETALL to enable direct, asychronous I/O.  Next let’s determine the SID for the session:

SELECT
  SID
FROM
  V$MYSTAT
WHERE
  ROWNUM=1;

 SID
----
 552

Start up a second SQL*Plus connection to the database, and prepare to execute the following script (documentation references: V$OSSTAT, V$SESS_TIME_MODEL, V$SESSTAT):

SPOOL CPU_STATS.TXT
SET PAGESIZE 2000
SET ARRAYSIZE 1000

SELECT
  STAT_NAME,
  VALUE
FROM
  V$OSSTAT
WHERE
  STAT_NAME IN ('SYS_TIME','USER_TIME','BUSY_TIME','IDLE_TIME','IOWAIT_TIME','NICE_TIME','RSRC_MGR_CPU_WAIT_TIME')
ORDER BY
  STAT_NAME;

SELECT
  STAT_NAME,
  VALUE
FROM
  V$SESS_TIME_MODEL
WHERE
  STAT_NAME='DB CPU'
  AND SID=552
ORDER BY
  STAT_NAME;

SELECT
  SN.NAME,
  SS.VALUE
FROM
  V$SESSTAT SS,
  V$STATNAME SN
WHERE
  SS.SID=552
  AND SS.STATISTIC#=SN.STATISTIC#
  AND (SN.NAME LIKE 'CPU%'
       OR SN.NAME LIKE '%cpu%'
       OR SN.NAME LIKE '%consistent%')
ORDER BY
  SN.NAME;

host sleep 60

SELECT
  STAT_NAME,
  VALUE
FROM
  V$OSSTAT
WHERE
  STAT_NAME IN ('SYS_TIME','USER_TIME','BUSY_TIME','IDLE_TIME','IOWAIT_TIME','NICE_TIME','RSRC_MGR_CPU_WAIT_TIME')
ORDER BY
  STAT_NAME;

SELECT
  STAT_NAME,
  VALUE
FROM
  V$SESS_TIME_MODEL
WHERE
  STAT_NAME='DB CPU'
  AND SID=552
ORDER BY
  STAT_NAME;

SELECT
  SN.NAME,
  SS.VALUE
FROM
  V$SESSTAT SS,
  V$STATNAME SN
WHERE
  SS.SID=552
  AND SS.STATISTIC#=SN.STATISTIC#
  AND (SN.NAME LIKE 'CPU%'
       OR SN.NAME LIKE '%cpu%'
       OR SN.NAME LIKE '%consistent%')
ORDER BY
  SN.NAME;

host sleep 60

...

SPOOL OFF

In the above script, repeat the three SQL statements (replacing 552 with the SID number returned by your query) between sets of host sleep 60 calls as frequently as required, possibly 30 times, so that the full run of the SQL statement that will be executed by the first session is captured (see page 213 of the “Expert Oracle Practices” book for a Windows equivalent of the Unix/Linux sleep command) .

In the first session, prepare to execute the following script:

SET AUTOTRACE TRACEONLY STATISTICS
SET ARRAYSIZE 1000

SELECT /*+ INDEX(T1) */
  ID,
  DESCRIPTION
FROM
  T1
WHERE
  ID BETWEEN 1 AND 20;

The index hint is added in the above just to make certain that the IND_T1 index is used when we select 1.27% of the rows from the test table.  Start the monitoring script in the second session and immediately start the above script in the first session. 

When the first session completes, it should output something like the following (from the 11.2.0.1 test):

127325 rows selected.

Statistics
---------------------------------------------------
          1  recursive calls
          0  db block gets
     127794  consistent gets
     121986  physical reads
          0  redo size
    7084492  bytes sent via SQL*Net to client
       1757  bytes received via SQL*Net from client
        129  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     127325  rows processed

Or the output might look something like the following (from the 11.1.0.6 test):

127325 rows selected.

Statistics
---------------------------------------------------
        154  recursive calls
          0  db block gets
     127727  consistent gets
     129179  physical reads
          0  redo size
    7084492  bytes sent via SQL*Net to client
       1757  bytes received via SQL*Net from client
        129  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     127325  rows processed

If we take the spool file output that was generated by the second session and calculate the delta values for the statistics, we find the following statistic deltas were captured during each 60 second time period for the 11.2.0.1 test run:

STAT_NAME                                                        1        2        3        4        5        6        7        8        9       10
------------------------------------------------------    -------- -------- -------- -------- -------- -------- -------- -------- -------- --------
IDLE_TIME                                                   48,854   48,753   48,672   48,645   47,920   47,666   48,425   48,652   48,648   48,664
BUSY_TIME                                                      151      147      155      160      896    1,158      426      172      177      162
USER_TIME                                                       29       21       29       32      731    1,001      285       43       48       53
SYS_TIME                                                        55       55       57       54       96       91       69       58       57       49
IOWAIT_TIME                                                  5,825    6,002    5,980    5,978    6,317    6,394    6,087    6,017    6,007    4,750
NICE_TIME                                                        0        0        0        0        0        0        0        0        0        0
RSRC_MGR_CPU_WAIT_TIME                                           0        0        0        0        0        0        0        0        0        0

DB CPU                                                     483,927  485,925  503,924  525,920  471,928  357,946  443,932  462,930  460,930  369,943

CPU used when call started                                      48       45       52       51       48       37       44       47       45       35
CPU used by this session                                        48       45       52       51       48       37       44       47       45       38
parse time cpu                                                   0        0        0        0        0        0        0        0        0        0
recursive cpu usage                                              0        0        0        0        0        0        0        0        0        0
consistent gets                                             13,379   13,659   13,767   14,343   11,657    9,834   12,945   13,666   13,714   10,830
consistent gets from cache                                  13,379   13,659   13,767   14,343   11,657    9,834   12,945   13,666   13,714   10,830
no work - consistent read gets                              13,377   13,659   13,767   14,343   11,657    9,834   12,945   13,666   13,714   10,830
consistent gets from cache (fastpath)                          692      602      622      707      448      442      574      587      569      564
consistent gets - examination                                    2        0        0        0        0        0        0        0        0        0
consistent gets direct                                           0        0        0        0        0        0        0        0        0        0
consistent changes                                               0        0        0        0        0        0        0        0        0        0
transaction tables consistent reads - undo records applied       0        0        0        0        0        0        0        0        0        0
transaction tables consistent read rollbacks                     0        0        0        0        0        0        0        0        0        0
data blocks consistent reads - undo records applied              0        0        0        0        0        0        0        0        0        0
cleanouts only - consistent read gets                            0        0        0        0        0        0        0        0        0        0
rollbacks only - consistent read gets                            0        0        0        0        0        0        0        0        0        0
cleanouts and rollbacks - consistent read gets                   0        0        0        0        0        0        0        0        0        0

The 11.1.0.6 test run (executed after the 11.2.0.1 run completed without shutting down the 11.2.0.1 instance that was running on the same server) showed the following statistics:

STAT_NAME                                                        1        2        3        4
------------------------------------------------------    -------- -------- -------- --------
IDLE_TIME                                                   48,470   48,536   48,547   48,568
BUSY_TIME                                                      593      293      274      236
USER_TIME                                                      345       47       42       35
SYS_TIME                                                       117       87       83       59
IOWAIT_TIME                                                  6,394    5,878    5,887    5,460
NICE_TIME                                                        0        0        0        0
RSRC_MGR_CPU_WAIT_TIME                                           0        0        0        0

DB CPU                                                     669,899  862,868  845,872  671,898

CPU used when call started                                      68       86       82       67
CPU used by this session                                        68       86       82       68
parse time cpu                                                   2        0        0        0
recursive cpu usage                                              1        0        0        0
consistent gets                                             24,985   35,023   35,024   32,699
consistent gets from cache                                  24,985   35,023   35,024   32,699
no work - consistent read gets                              24,972   35,023   35,024   32,699
consistent gets from cache (fastpath)                       24,296   34,076   34,067   31,802
consistent gets - examination                                   13        0        0        0
consistent gets direct                                           0        0        0        0
consistent changes                                               0        0        0        0
transaction tables consistent reads - undo records applied       0        0        0        0
transaction tables consistent read rollbacks                     0        0        0        0
data blocks consistent reads - undo records applied              0        0        0        0
cleanouts only - consistent read gets                            0        0        0        0
rollbacks only - consistent read gets                            0        0        0        0
cleanouts and rollbacks - consistent read gets                   0        0        0        0

As evidenced by the above, the SQL statement completed in just less than 10 minutes in Oracle 11.2.0.1, while the SQL statement in 11.1.0.6 completed in just less than 4 minutes.  Why the significant time difference?  Maybe the above stats provide a clue?

The IDLE_TIME and BUSY_TIME statistics are server-wide with each of the server’s CPUs accumulating one second for every second of elapsed time, with the one second split between these two statistics.  USER_TIME is a server-wide statistic that indicates how many CPU centiseconds were spent executing application/Oracle level activity.  SYS_TIME is a server-wide statistic that indicates how many CPU centiseconds were spent executing operating system code (in kernel mode), typically on behalf of another program that usually accumulates CPU consumption time in the USER_TIME statistic.  In the above, DB CPU is accumulated at the session level and must be divided by 1,000,000 to convert the values to seconds.  The CPU used when call started and CPU used by this session statistics are at the session level and are reported in centiseconds.  If we look at the first time period for the Oracle 11.1.0.6 execution, we are able to derive the following statistics:

  • (48,470 idle centiseconds + 593 busy centiseconds) / (8 CPUs)  = 6132.875 centiseconds = 61.33 seconds in the time period
  • (593 / 100) = CPUs were used for 5.93 seconds of a total possible (61.33 * 8 ) CPU seconds in the time period
  • (593 busy centiseconds) / (48,470 idle centiseconds + 593 busy centiseconds) * 100 = CPUs were 1.21% busy
  • (345 user mode centiseconds) / (345 user mode centiseconds + 117 kernel mode centiseconds) * 100 = 74.68% of the CPU time was consumed in user mode
  • (669,899 / 1,000,000) = Monitored session consumed 0.6699 seconds of CPU time in the 61.33 second time period
  • (669,899 / 1,000,000) / (345 / 100) * 100 = Monitored session consumed 19.41% of the user mode CPU time on the server
  • (68 / 100) = Session of interest consumed 0.68 seconds of CPU time in the 61.33 second time period
  • (2 / 100) = 0.02 seconds session spent parsing the SQL statement
  • (1 / 100) = 0.01 seconds session spent performing recursive calls to optimize the SQL statement
  • (24,296 / 24,985) * 100 = 97.24% of the consistent gets were performed by fastpath (the statistic definition is apparently not documented – I suspect that it might mean that the blocks were read by direct path reads, thus bypassing the buffer cache – but I suppose that it might mean instead that multiple single-block read requests were combined into a single parallel read request)

The DB CPU statistic in the V$SESS_TIME_MODEL view and the various statistics in the V$OSSTAT view are updated in near real-time – the monitored session only consumed about 0.6 seconds of CPU time for every 60 seconds of execution time.  You will notice that for both Oracle Database 11.2.0.1 and 11.1.0.6 the delta values for the CPU used when call started statistic and the CPU used by this session statistic appear as something other than the number 0 throughout the execution time, so the statistics must be updated at least once a minute in this test run, but that appears to contradict the results shown in Table 8-1 on page 201 of the “Expert Oracle Practices” book where those statistics were only updated 5 minutes and 35 seconds after a SQL statement was submitted (in the book, the CPU used by this session statistic was updated to show only the parse CPU time, until 5 minutes and 35 seconds elapsed).  Is the book wrong, or are the above statistic deltas wrong, or maybe there is another explanation?

Fortunately, while the SQL statement executed, I also captured several statistic delta values from V$SQL for the SQL statement using my Toy Project for Performance Tuning, but a script like that found on page 216 of the “Expert Oracle Practices” book will also collect similar statistics.  The captured deltas for the SQL statement on Oracle 11.2.0.1 are as follows (captured roughly every 60 seconds):

CPU Time  Elapsed Time  Parse Calls  Executions  Fetches  Users Opening  Users Executing  Invalidations  Loads  Disk Reads  Buffer Gets
    0.47         56.84            1           1       13              1                1              0      1      12,148       12,840
    0.50         63.62            0           0       15              0                1              0      0      13,648       14,251
    0.52         64.14            0           0       14              0                1              0      0      13,801       14,514
    0.54         61.81            0           0       15              0                1              0      0      13,922       14,572
    0.45         61.15            0           0       11              0                1              0      0      10,598       11,060
    0.44         69.25            0           0       12              0                1              0      0      11,559       12,036
    0.47         64.16            0           0       14              0                1              0      0      13,318       14,041
    0.48         62.28            0           0       14              0                1              0      0      13,439       14,071
    0.47         64.97            0           0       14              0                1              0      0      13,823       14,386
    0.20         26.24            0           0        7              0                0              0      0       5,730        6,023

There were roughly 13 fetches in the first minute, roughly 15 fetches in the second minute, 14 fetches in the third minute, etc.  The statistics in V$SQL are updated at least once every 6 seconds, so the displayed CPU time values (shown in seconds) are reasonably close to actual CPU consumption times.  You might recognize that the above CPU Time delta values are very close to the delta values shown for the CPU used by this session statistic, so does that mean that the CPU used by this session statistic is updated every 6 seconds or less?  That brings us back to the question, what is considered by the documentation as a “user call” in regard to these two statistics?  Given the above evidence, a “user call” in this context actually includes fetch calls.

Because of the inconsistency with which the CPU used by this session statistic is updated (it might not be updated for more than an hour while executing certain SQL statements), it is a better idea to use the DB CPU statistic found in the V$SESS_TIME_MODEL view (available starting with the Oracle 10.1 release) when trying to determine how much CPU time a particular session is consuming.  Of course operating system tools such as ps and top or my VBS script for Windows may also be used to determine which sessions are consuming CPU time.  The V$SESSION and V$SESSION_WAIT views are also able to help you determine which sessions are currently consuming CPU time and roughly how much CPU time has been consumed since the last wait event ended.

But a parallel question remains, why is a particular session consuming CPU time (the above test case is not necessarily the best example for showing uncontrolled CPU usage, try this test case with the OPTIMIZER_FEATURES_ENABLE parameter set to 10.2.0.4 or lower at the session level)?  The  V$SESSION_LONGOPS and V$SQL_MONITOR (introduced in Oracle 11.1, requires a Tuning Pack license to access) views might provide enough information to determine why a session is consuming CPU time.  A 10046 extended SQL trace could also potentially show why a session is consuming an excessive amount of CPU time, unless, of course, nothing is written to the trace files because the session is performing a CPU intensive operation completely in memory.  What then?  Multiple ORADEBUG SHORT_STACK traces (there is a small chance that creating one of these dumps will crash the session) for the session with the help of Metalink Doc ID 175982.1 could provide a clue of what is happening in the session.  If that does not help find the cause of the CPU usage, you could just declare that the Oracle Wait Interface is Useless (Sometimes) and move along to something more productive (I suggest reviewing that blog series).

Added March 7, 2010:
I thought that the laptop was running a bit sluggish today.  It has been in and out of hibernation on a daily basis for the last two weeks.  I thought that I would check the CPU speed just for fun – this is what I received on 2 checks of the performance with the Power Plan set to High Performance and with the laptop plugged into the electrical power:

I guess that explains why this darn laptop feels so slow – as I write this update to the blog article I see that the core frequency has again dropped:

Is Intel trying to strong arm me into buying one of those Core i7 920QX laptops, or is Microsoft trying to get me to upgrade to Windows 7 on this laptop?

—————-

Update August 10, 2010:

The CPU speed has dipped to 0.210GHz several times, possibly caused by overheating:

——

I think that I finally have a workable solution:


Actions

Information

One response

20 03 2010
Blogroll Report 19/02/2010 – 26/02/2010 « Coskan’s Approach to Oracle

[...] 2-When does Oracle update CPU usage statistics ? Charles Hooper-CPU Usage Monitoring – What are the Statistics and when are the Statistics Updated? [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Follow

Get every new post delivered to your Inbox.

Join 141 other followers

%d bloggers like this: