CPU Run Queue – What is Wrong with this Quote?

14 06 2010

June 14, 2010

I found another interesting couple of lines in the June 2010 printing of the “Oracle Performance Firefighting” book.  This quote is from page 116:

“While it may seem strange, the run queue reported by the operating system includes processes waiting to be serviced by a CPU as well as processes currently being serviced by a CPU. This is why you may have heard it’s OK to have the run queue up to the number of CPUs or CPU cores.”

Also, this quote from page 123:

“Have you ever heard someone say, ‘Our CPUs are not balanced. We’ve got to get that fixed.’? You might have heard this if you’re an operating system kernel developer or work for Intel, but not as an Oracle DBA. This is because there is a single CPU run queue, and any available core can service the next transaction.”

Also, this quote from page 136:

“The other area of misrepresentation has to do with time in the CPU run queue. When Oracle reports that a process has consumed 10 ms of CPU time, Oracle does not know if the process actually consumed 10 ms of CPU time or if the process first waited in the CPU run queue for 5 ms and then received 5 ms of CPU time.”

Interesting… regarding the first quote – most of what I have read about the CPU run queue seemed to indicate that the process was removed from the run queue when the process is running on the CPU, and then re-inserted into the run queue when the process stops executing on the CPU (assuming that the process has not terminated and is not suspended).  The “Oracle Performance Firefighting” book lacks a test case to demonstrate that the above is true, so I put together a test case using the CPU load generators on page 197 of the “Expert Oracle Practices” book, the Linux sar command, and a slightly modified version (set to refresh every 30 seconds rather than every second) of the WMI script on pages 198-200 of the “Expert Oracle Practices” book.

For the test, I will use the following command on Linux:

sar -q 30 10

Immediately after the above command is started, a copy of the Linux version of the CPU load generator will be run (the load generator runs for 10 minutes and then exits):

#!/bin/bash
i=0
STime=`date +%s`

while [ `date +%s` -lt $(($STime+$((600)))) ]; do
  i=i+0.000001
done

Every time a new line is written by the sar utility another copy of the CPU load generator is started.  For the first test run I manually launched a new command line from the GUI and then started the script.  For the second test run I first opened as many command line windows as necessary, and prepared each to execute the script.  Here is the output (the runq-sz column shows the run queue):

[root@airforce-5 /]# sar -q 30 10
Linux 2.6.18-128.el5 (airforce-5.test.com)      06/13/2010

05:39:16 PM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15
05:39:46 PM         1       228      0.76      0.37      0.16
05:40:16 PM         2       230      1.04      0.48      0.21
05:40:46 PM         3       232      1.31      0.59      0.25
05:41:16 PM         4       233      1.86      0.79      0.33
05:41:46 PM         6       237      2.81      1.11      0.45
05:42:16 PM         7       241      3.71      1.48      0.59
05:42:46 PM         9       244      5.56      2.14      0.84
05:43:16 PM        12       247      7.86      3.00      1.16
05:43:46 PM        16       250     10.29      4.04      1.56
05:44:16 PM        14       250     12.03      5.06      1.98
Average:            7       239      4.72      1.91      0.75

[root@airforce-5 /]# sar -q 30 10
Linux 2.6.18-128.el5 (airforce-5.test.com)      06/13/2010

05:50:53 PM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15
05:51:23 PM         1       237      0.54      3.41      2.76
05:51:53 PM         3       239      1.44      3.35      2.76
05:52:23 PM         3       241      2.20      3.35      2.78
05:52:53 PM         5       242      3.31      3.51      2.85
05:53:23 PM         7       245      4.53      3.78      2.96
05:53:53 PM        10       247      6.40      4.31      3.16
05:54:23 PM        13       249      8.60      5.03      3.43
05:54:53 PM        13       249     10.71      5.87      3.76
05:55:23 PM        16       253     12.16      6.68      4.10
05:55:53 PM        14       251     13.02      7.41      4.42
Average:            8       245      6.29      4.67      3.30

For the sake of comparison, here is a CPU load generator script that executes on Windows that performs the same operation as the script which was executed on Linux:

Dim i
Dim dteStartTime

dteStartTime = Now

Do While DateDiff("n", dteStartTime, Now) < 10
  i = i + 0.000001
Loop

Let’s use the WMI script in place of the Linux sar command and repeat the test.  The WMI script will be started, the CPU load generator script will be started, and every time the WMI script outputs a line another copy of the CPU load generator script will be started.  Here is the output from the WMI script (the Q. Length column shows the run queue):

6/13/2010 6:29:27 PM Processes: 53 Threads: 825 C. Switches: 1757840 Q. Length: 0
6/13/2010 6:29:57 PM Processes: 54 Threads: 826 C. Switches: 32912 Q. Length: 0
6/13/2010 6:30:27 PM Processes: 56 Threads: 831 C. Switches: 71766 Q. Length: 0
6/13/2010 6:30:57 PM Processes: 58 Threads: 836 C. Switches: 39857 Q. Length: 0
6/13/2010 6:31:27 PM Processes: 59 Threads: 830 C. Switches: 33946 Q. Length: 0
6/13/2010 6:31:57 PM Processes: 59 Threads: 821 C. Switches: 27955 Q. Length: 1
6/13/2010 6:32:27 PM Processes: 61 Threads: 830 C. Switches: 32088 Q. Length: 0
6/13/2010 6:32:57 PM Processes: 63 Threads: 826 C. Switches: 27027 Q. Length: 0
6/13/2010 6:33:29 PM Processes: 64 Threads: 827 C. Switches: 22910 Q. Length: 3
6/13/2010 6:34:01 PM Processes: 66 Threads: 836 C. Switches: 22936 Q. Length: 4
6/13/2010 6:34:34 PM Processes: 68 Threads: 839 C. Switches: 34076 Q. Length: 5
6/13/2010 6:35:07 PM Processes: 70 Threads: 840 C. Switches: 25564 Q. Length: 8

What, if anything, is wrong with the above quotes from the book?  The comments in this article might be helpful.

The point of blog articles like this one is not to insult authors who have spent thousands of hours carefully constructing an accurate and helpful book, but instead to suggest that readers investigate when something stated does not exactly match what one believes to be true.  It could be that the author “took a long walk down a short pier”, or that the author is revealing accurate information which simply cannot be found through other resources (and may in the process be directly contradicting information sources you have used in the past).  If you do not investigate in such cases, you may lose an important opportunity to learn something that could prove to be extremely valuable.








Follow

Get every new post delivered to your Inbox.

Join 144 other followers