Network Monitoring Experimentations 5

27 04 2010

April 27, 2010 (Updated May 13, 2010)

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

Previous articles in this series showed a lot of interesting capabilities that Wireshark and similar packet capture utilities offer, such as seeing the behind the scenes SQL statements that do not appear in a 10046 trace when Autotrace is enabled in SQL*Plus.  But there is more that Wireshark offers to assist with analysis of Oracle data transfers across the network.  Today’s article explores some of these additional features while experimenting with different array fetch sizes using Oracle Database 11.2.0.1 with default listener settings on the server side, and Oracle client 11.1.0.7 on the client-side, with an 802.11g wireless network between the two computers (greater network packet latency than a wired connection).

The test table and SQL statement used in this article is similar to those used in this article, which demonstrated that the execution plan on Oracle Database 11.2.0.1 differed from that used on earlier releases, and that difference may affect your ability to reproduce the test results for today’s article if you are using an older release of Oracle Database.

CREATE TABLE T5 (
  C1 VARCHAR2(10),
  C2 VARCHAR2(100),
  PRIMARY KEY (C1));

INSERT INTO T5 NOLOGGING
SELECT
  'A'||TO_CHAR(ROWNUM,'0000000'),
  RPAD('A',100,'A')
FROM
  DUAL
CONNECT BY
  LEVEL<=1000000;

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T5',CASCADE=>TRUE)

The test script that will be used is as follows, executed after Wireshark packet capturing is enabled on the client computer:

SET AUTOTRACE TRACEONLY STATISTICS

SET ARRAYSIZE 10

SELECT /*+ USE_NL(T51 T52 T53 T54 T55 T56 T57 T58) */
  T51.C1,
  T51.C2,
  T52.C1,
  T52.C2,
  T53.C1,
  T53.C2,
  T54.C1,
  T54.C2,
  T55.C1,
  T55.C2,
  T56.C1,
  T56.C2,
  T57.C1,
  T57.C2,
  T58.C1,
  T58.C2
FROM
  T5 T51,
  T5 T52,
  T5 T53,
  T5 T54,
  T5 T55,
  T5 T56,
  T5 T57,
  T5 T58
WHERE
  T51.C1=T52.C1
  AND T51.C1=T53.C1
  AND T51.C1=T54.C1
  AND T51.C1=T55.C1
  AND T51.C1=T56.C1
  AND T51.C1=T57.C1
  AND T51.C1=T58.C1
  AND T51.C1 BETWEEN 'A 0001000' AND 'A 1000000';

host sleep 5

SET ARRAYSIZE 50

SELECT /*+ USE_NL(T51 T52 T53 T54 T55 T56 T57 T58) */
  T51.C1,
  T51.C2,
  T52.C1,
  T52.C2,
  T53.C1,
  T53.C2,
  T54.C1,
  T54.C2,
  T55.C1,
  T55.C2,
  T56.C1,
  T56.C2,
  T57.C1,
  T57.C2,
  T58.C1,
  T58.C2
FROM
  T5 T51,
  T5 T52,
  T5 T53,
  T5 T54,
  T5 T55,
  T5 T56,
  T5 T57,
  T5 T58
WHERE
  T51.C1=T52.C1
  AND T51.C1=T53.C1
  AND T51.C1=T54.C1
  AND T51.C1=T55.C1
  AND T51.C1=T56.C1
  AND T51.C1=T57.C1
  AND T51.C1=T58.C1
  AND T51.C1 BETWEEN 'A 0001000' AND 'A 1000000';

host sleep 5

SET ARRAYSIZE 100

SELECT /*+ USE_NL(T51 T52 T53 T54 T55 T56 T57 T58) */
  T51.C1,
  T51.C2,
  T52.C1,
  T52.C2,
  T53.C1,
  T53.C2,
  T54.C1,
  T54.C2,
  T55.C1,
  T55.C2,
  T56.C1,
  T56.C2,
  T57.C1,
  T57.C2,
  T58.C1,
  T58.C2
FROM
  T5 T51,
  T5 T52,
  T5 T53,
  T5 T54,
  T5 T55,
  T5 T56,
  T5 T57,
  T5 T58
WHERE
  T51.C1=T52.C1
  AND T51.C1=T53.C1
  AND T51.C1=T54.C1
  AND T51.C1=T55.C1
  AND T51.C1=T56.C1
  AND T51.C1=T57.C1
  AND T51.C1=T58.C1
  AND T51.C1 BETWEEN 'A 0001000' AND 'A 1000000';

host sleep 5

SET ARRAYSIZE 500

SELECT /*+ USE_NL(T51 T52 T53 T54 T55 T56 T57 T58) */
  T51.C1,
  T51.C2,
  T52.C1,
  T52.C2,
  T53.C1,
  T53.C2,
  T54.C1,
  T54.C2,
  T55.C1,
  T55.C2,
  T56.C1,
  T56.C2,
  T57.C1,
  T57.C2,
  T58.C1,
  T58.C2
FROM
  T5 T51,
  T5 T52,
  T5 T53,
  T5 T54,
  T5 T55,
  T5 T56,
  T5 T57,
  T5 T58
WHERE
  T51.C1=T52.C1
  AND T51.C1=T53.C1
  AND T51.C1=T54.C1
  AND T51.C1=T55.C1
  AND T51.C1=T56.C1
  AND T51.C1=T57.C1
  AND T51.C1=T58.C1
  AND T51.C1 BETWEEN 'A 0001000' AND 'A 1000000';

host sleep 5

SET ARRAYSIZE 1000

SELECT /*+ USE_NL(T51 T52 T53 T54 T55 T56 T57 T58) */
  T51.C1,
  T51.C2,
  T52.C1,
  T52.C2,
  T53.C1,
  T53.C2,
  T54.C1,
  T54.C2,
  T55.C1,
  T55.C2,
  T56.C1,
  T56.C2,
  T57.C1,
  T57.C2,
  T58.C1,
  T58.C2
FROM
  T5 T51,
  T5 T52,
  T5 T53,
  T5 T54,
  T5 T55,
  T5 T56,
  T5 T57,
  T5 T58
WHERE
  T51.C1=T52.C1
  AND T51.C1=T53.C1
  AND T51.C1=T54.C1
  AND T51.C1=T55.C1
  AND T51.C1=T56.C1
  AND T51.C1=T57.C1
  AND T51.C1=T58.C1
  AND T51.C1 BETWEEN 'A 0001000' AND 'A 1000000';

host sleep 5

SET ARRAYSIZE 5000

SELECT /*+ USE_NL(T51 T52 T53 T54 T55 T56 T57 T58) */
  T51.C1,
  T51.C2,
  T52.C1,
  T52.C2,
  T53.C1,
  T53.C2,
  T54.C1,
  T54.C2,
  T55.C1,
  T55.C2,
  T56.C1,
  T56.C2,
  T57.C1,
  T57.C2,
  T58.C1,
  T58.C2
FROM
  T5 T51,
  T5 T52,
  T5 T53,
  T5 T54,
  T5 T55,
  T5 T56,
  T5 T57,
  T5 T58
WHERE
  T51.C1=T52.C1
  AND T51.C1=T53.C1
  AND T51.C1=T54.C1
  AND T51.C1=T55.C1
  AND T51.C1=T56.C1
  AND T51.C1=T57.C1
  AND T51.C1=T58.C1
  AND T51.C1 BETWEEN 'A 0001000' AND 'A 1000000';

Windows client computers do not have an operating system sleep command, so if the client is running on Windows, create a file named sleep.vbs in the current folder used by SQL*Plus with the following code:

Const lngSleepSecondsDefault = 600
Dim lngSleepSeconds
Dim objScriptArguments
Dim strArgument

Set objScriptArguments = WScript.Arguments

If objScriptArguments.Count = 0 then
  lngSleepSeconds = lngSleepSecondsDefault
Else
  For Each strArgument in objScriptArguments
    If IsNumeric(strArgument) Then
      lngSleepSeconds = cLng(strArgument)
      Exit For
    End If
  Next
  If lngSleepSeconds <= 0 Then
    lngSleepSeconds = lngSleepSecondsDefault
  End If
End If

WScript.Sleep lngSleepSeconds * 1000

Before enabling the Wireshark capture, execute the test SQL statement a couple of times to limit the potential effects of a cold cache and physical reads unfairly affecting the first execution with the small array fetch size once packet logging is enabled. Start the Wireshark capture on the client, and then execute the above test script in SQL*Plus on the client computer.  The client will fetch approximately 999,001 rows from the database, and will likely result in a Wireshark packet capture totalling about 600MB.  Later processing of the packet capture, as depicted later in this article, will certainly tax the memory limits of 32 bit operating systems, and may result in Wireshark crashing (the latest beta version of Wireshark was used in this test).  Additionally, processing of the resulting packet capture is extremely CPU intensive – you have been warned.

First, we will start the analysis by right-clicking one of the TNS type packets, and select Follow TCP Stream.  The resulting window that is eventually displayed will not be used, we are just using that feature to enter the Filter criteria shown near the top of the Wireshark window with the light green background in the picture below:

As shown above, when a line is checked in the lower pane, such as Time delta from previous captured frame, the status bar at the bottom of the window is updated with a field name, in this case frame.time_delta.  If you manually review the Time column in the Wireshark window you will see that the time delta between packets varies quite a bit.  I wonder if the time delta values vary depending on the array fetch size… we could extract the information and process it a spreadsheet package to determine whether the time deltas change.  More on this later.  The packet frame length for this packet is 704 bytes.

If we scroll down to the bottom of the lower pane, we see that Wireshark has attempted to decode this TNS packet, all except for decrypting the control characters and binary data in the data portion of the packet.

We see that the TNS packet length is 650 of the packet frame’s 704 bytes, and the actual data contained in the packet is 640 of those 650 bytes.  The status bar shows that the field name for the data length is data.len – we will use this information later.  The data section shows that this is the submission of the first SQL statement in our test script.

The following screenshot is showing where the server is returning the data to the client early during the execution of the test script.  In this case, with the array fetch size set to 10, the server had to break the selected data into two network packets, with the first containing bytes 0 through 1459, and the second containing bytes 1460 through 1760 (just 301 bytes in the second packet).

Scrolling down near the end of the packet capture, we see one of the network packets where the array fetch size of 5,000 was in use.  The displayed time delta is 0.000341 seconds, which is very good for a wireless network.

Moving down to the next packet, we find that this packet is 855 bytes in length, but the total TCP length is 8,155 bytes, compared to the value 1,761 that we found when the array fetch size of 10 was in use.

The field name for the reassembled TCP length, as displayed in the status bar, is tcp.reassembled.length – we will use this later.

Now for the exciting part, graphing the result.  If you had to make a choice of which array fetch size to use: 10, 50, 100, 500, 1000, or 5000, which should you use, and why?  From the Statistics menu in Wireshark, select the IO Graphs menu item.  You should see something like this if you set the Tick Interval to 10 seconds and the Pixels per tick to 10.  The Tick Interval value determines how the statistics will be grouped together, in this case it is 10 second intervals.  The Pixels per tick value determines how horizontally spread out, the distance between the data items when displayed on screen.

The first bump width in the values represents the time required to retrieve the data 10 rows at a time.  The second bump width represents the time required to retrieve 50 rows at a time.  The third bump width represents the time required to retrieve 100 rows at a time.  The remaining bump widths show the time required to fetch 500, 1000, and 5000 rows at a time, respectively.  The height of the bumps, in this case, indicates the number of packets transmitted between the server and the client in each of the 10 second time intervals.  When the client retrieved 10 rows at a time, only about 50% of the number of network packets flowed between the server and client as when the client retrieved 50 or 100 rows at a time – this of course increased the time required to transmit all of the requested rows.

By changing Packets/Tick to Bytes/Tick we are able to determine how much data (including overhead) is actually contained in the packets flowing across the network every 10 seconds.  While the bumps in the previous screenshot showed that roughly the same number of packets flowed across the network for the array fetch sizes of 50 and 100, this screenshot shows that the amount of usable data (plus overhead) in the packets was about 20% less – this explains why the bump representing the array fetch size of 50 is about 20% longer than the bump representing the array fetch size of 100.

Interesting, but why were there only about half as many packets transferred every 10 seconds when the array fetch size was set to 10 compared to when it was set to 50?  Did I hear someone say “wait and see”, or was it “latency”?  :-)  If we change the Units field to Advanced we are able to find out what is happening.  For the Filter of Graph 1 I specified tcp.stream eq 1 (the same filter as was entered automatically when we earlier selected to Follow the TCP Stream).  For the Calc entry I selected Max(*) and then in the next field to the right I specified frame.time_delta_displayed – we saw earlier where this field was obtained in the main Wireshark window.  I then set the scale to 100000 so that I was able to see both the peaks and valleys in the graph.  The graph shows that there is a maximum latency between packets of about 50ms when the array fetch size was set to 10, and that maximum latency dropped significantly until the array fetch size increased to 1,000.  With the array fetch size set to 5,000 the maximum latency is nearly identical to that of the maximum latency when the array fetch size was set to 10 – at this point the client is essentially choking on all of the data that it is receiving, it is taking the client roughly 50ms to process the rows and send the next fetch request to the server.

Neat, what else are we able to do?  This time we will turn off the first graph an create two additional graphs.  The first of the additional graphs shows the sum of the sizes of the TNS section of the network packets for each of the 10 second time intervals, while the second of the additional graphs shows the maximum data length in the TNS section – how much actual data is stuffed into the network packets (per the configured Oracle SDU, and may actually be broken into multiple packets).  A logarithmic scale is selected for the data since the sum of the TNS section lengths will be considerably larger than the maximum data length in any given 10 second time interval.  The green line, representing the amount of actual data in the packets, shows that in a best case scenario, when the array fetch size is set to 10 all of the packets are leaving without being full, or more correctly, the SDU size is lower - so, not only are half as many packets transmitted per second, but in a best case each packet also contains less data.  The situation improves a bit when the fetch array size increased to 50, and then stabilized at a higher value when the fetch array size was bumped to 100.

The previous graph is somewhat useful, yet at the same time somewhat useless.  Rather than looking at the maximum of how full the data packets were in each 10 second time interval, it is probably more useful to look at how full the packets were on average.  Well, the following graph does not actually show that – it shows the amount of data Oracle is packaging before transmission, as controlled by the session data unit (SDU) size, which typically has a maximum default value of 8KB.  The following modified version of the previous chart shows how the average SDU size of the packets changed as the array fetch size increased.  The effective average SDU size doubled as the array fetch size increased from 100 to 500 – does that mean that the transfer completed twice as fast?  No, as shown in the previous graphs, but it might have made a greater difference with a gigabit wired network.

Wireshark also provides the option of controlling the graph type.  In the following screenshot I added the tcp.reassembled.length as Graph 4, changed Graph 2 to an Impulse style (vertical line) and changed Graph 3 to FBar style (vertical rectangle).  Notice that the tcp.reassembled.length values are greater when the array fetch size is 100 than when it is at any other value.  I believe that the reason why this is the case is because on average there was a greater percentage of “overhead” bytes in the TNS section of the packets when array fetching 100 rows at a time when compared to array fetching, for example, 500 rows at a time.

With the above in mind, what is the ideal array fetch size for this SQL statement?  Stop playing with your data packets and answer the question.  :-)

——–

Added May 13, 2010:

First 4 packets retrieving rows after the initial single row fetch, fetch array size set to 10, response to 3 fetch calls (click to zoom):

First 4 packets retrieving rows after the initial single row fetch, fetch array size set to 5,000, response to 1 fetch call (click to zoom):








Follow

Get every new post delivered to your Inbox.

Join 137 other followers