Favorite Free Applications to Troubleshoot Visual Manufacturing and Windows – Part 10 Wireshark

2 01 2021

January 2, 2021

I have been working with Infor Visual Manufacturing ERP system for roughly 21 years now, and have been assisting users at other companies with Visual Manufacturing issues for the last 20 years. In 2019 with Yahoo announcing the eventual shut down of their Yahoo Groups platform, I worked with two other long-term members of the Visual Manufacturing VMIUGLIST and VMIUG-TEC groups on Yahoo to set up those groups on the Google Groups platform. Since the groups were set up on Google Groups, I have periodically posted technical articles of various topics to the VMIUG-TEC group to help group members solve various types problems and to take better advantage of the Visual Manufacturing package. The most recent article series describes various free operating system commands and utilities that I have used to troubleshoot various types of issues with Visual Manufacturing, Windows servers, and networks. The article that I published in the group a few days ago is titled: “How To: Favorite Free Applications to Troubleshoot Visual and Windows – Part 10 Wireshark”. Eleven years ago on this blog I posted a series of of articles titled Network Monitoring Experimentations, where I showed how to use various functionality in Wireshark. As there is a strong tie-in with that article series and the latest article that I posted to the VMIUG-TEC group, I decided to re-post the VMIUG-TEC article here (with no changes).

————————————————————————————

This article is the tenth article in the series of articles that describe free applications and utilities for troubleshooting Visual Manufacturing problems and general Windows reliability.  There are many such applications and utilities, but this article series will cover those that I have personally found helpful over the years.  Today’s article describes Wireshark (https://www.wireshark.org/download.html ), which is a very popular network activity capture and analysis tool.  Prior to 2006 Wireshark was commonly known as Ethereal, and was renamed to Wireshark when the original developer started working for a different company in 2006.  If I recall correctly, I first started using Ethereal in either late 2001 or early 2002 while testing the effectiveness of an iptables-based Linux firewall that I built from the ground up using a Linux firewalls book and a copy of Red Hat Linux.  Since that time, Ethereal and Wireshark have quite possibly assisted me in the resolution of more computer / network problems than any other tool that I have described so far in this article series.  A few such examples (including Wireshark screen captures when possible):

  • Greatly increase the throughput of file transfers.  A scheduled file transfer from a Windows Server 2016 server to a Synology NAS over a gigabit network connection executed at an average rate of 34 MB/s as long as the server was otherwise idle.  If another scheduled file transfer, for example copying Oracle backup files to the Windows server, was also in process, both file transfers would transfer at a maximum rate of four MB/s.  If BackupExec running on another server was actively backing up files on the Windows Server 2016 server, the speed of the scheduled file transfer would drop to just four KB/s!  With the help of Wireshark problems with the network switch, network cables, and Windows server configuration with four network ports connected to the same switch (network team created with the Microsoft Network Adapter Multiplexor Driver) could be eliminated as problems, even though some unexpected behavior caused by “Large Send Offload” being enabled was found in the Wireshark’s logs.  Once the problem with the scheduled file transfer was fixed, the scheduled file transfer executed at a rate faster than 113 MB/s (close to the maximum real-world throughput for gigabit network connections) even while BackupExec was copying files from the server at a rate of about 103 MB/s.  Two actual problems were found in this case.  1) When processes (ROBOCOPY in this case, which was mentioned in part two of this article series) are run in a Windows scheduled task, those processes are given a low I/O priority (as shown in Windows Resource Monitor) and a low memory priority.  To fix that problem, I exported the Windows scheduled task to an XML file, changed the <Priority>7</Priority> value to <Priority>4</Priority>, saved the XML file, deleted the original scheduled task, and then imported the previously exported XML file.  That change prevented Windows from automatically throttling down the disk activity of the process running in the scheduled task when another process also required disk activity.  2) The ROBOCOPY command that was running in the scheduled task also required a bit of adjustment to eliminate file caching, resumable copying, and intermediate progress updates.  So, the ROBOCOPY command changed from: ROBOCOPY /MIR /R:3 /Z /XO C:\Shares\Backup Y:\Backup To: ROBOCOPY /MIR /R:3 /J /NP /XO C:\Shares\Backup Y:\Backup
  • Determine how it was possible to ping the IP address of a “transparent” firewall located between the corporate network and programmable logic controllers (PLCs), yet it was not possible to access the configuration web interface of the “transparent” firewall.  Wireshark analysis of ARP broadcast traffic found that when devices were trying to resolve the MAC address of the IP address which was assigned to the firewall, another device that was thought to be discarded a few years ago also answered the ARP requests in addition to the firewall.  Wireshark identified the MAC address of both devices using the IP address, making it possible to hunt down the rogue device.
  • Pinpoint the cause of several multi-hour network outages to an old, barely used 3COM network switch that repeatedly fell off the network and then reappeared on the network.  That switch had the spanning tree protocol enabled, as did all other managed switches, and used default settings for the spanning tree protocol (which assign the same Priority value of 32768 to all devices, causing the device with the alphabetically lowest MAC address on the network to become the Root device to which all other switches attempt to map the lowest cost communication path).  Replacing that old switch with a newer non-3COM managed switch fixed that problem without requiring any adjustment to other switches (even though another switch near the servers should have been configured as the root switch).
  • Locate apparently offline / broken security cameras operating unexpectedly in the wrong network VLAN due to an inexperienced employee unplugging network cables and then reconnecting those network cables into the wrong network switch ports.
  • Collect evidence of a PLC becoming confused due to receiving excessive ARP packets, leading to the PLC unintentionally disabling client computer access to the network virus scanner server by poisoning the MAC routing tables in network switches due to the PLC specifying an invalid corrupt Target MAC address with the virus scanner server’s IP address specified as the Target IP address in unrequested ARP response packets.
  • Discover that optimizing network performance per Microsoft’s Server 2003 documentation leads to a Nagle – Delayed ACK problem when the Windows server attempts to copy files from a Linux server that is using SAMBA (to allow Windows computers to access files stored on the Linux server).
  • Decrease the Visual Manufacturing 6.5.2 startup time from more than 10 seconds to approximately one second.  The Wireshark log indicated that a change was needed to the database specifications in the SQL.INI file (eliminate TNS: from the remotedbname= lines), and also an undocumented missing file named NetHasp.ini had to be created in the Visual EXE folder.
  • Pinpoint the source of a flood of ARP packets on the network that were causing multiple PLCs (protected behind “transparent” firewalls) to malfunction and stop responding on the network.  Wireshark showed that a single computer was repeatedly sending out ARP requests trying to sequentially resolve the MAC address of all IP addresses in multiple class C subnets (with a subnet mask of 255.255.0.0, the computer would repeatedly check the first four class C subnets).  Wireshark identified the MAC address and IP address of the sender of those packets, and then either the TRACERT command or a DNS lookup was performed to determine the computer that was sending the ARP packets (if TRACERT and DNS failed to locate the name of the device, the MAC routing tables in the various managed switches would have been checked to locate the switch port into which only the sender’s MAC address is listed to pinpoint the device).  Once the computer was located, it was connected wirelessly to a cheap Linksys router (specifically bought for this purpose), along with two other computers that were running Wireshark, and then Process Monitor (described in part five of this article series) was simultaneously run on the computer to determine what malware or other software on the Lenovo laptop was flooding the network with ARP packets.  Process Monitor nicely documents when applications send information across the network, but unfortunately it does not capture ARP traffic, which operates on Open Systems Interconnection (OSI) layer two ( https://en.wikipedia.org/wiki/OSI_model ), so Process Monitor could not quickly show which process(es) on the computer was sending the large number of ARP packets.  Wireshark running on the other computers showed that when connected to the cheap router the Lenovo laptop would repeatedly send out ARP packets sequentially by IP address in the class C subnet, and if any device responded to an ARP request, that device would then be pinged on a frequent basis without the computer sending out another ARP packet for the responding IP address (at least for a period of time when the entry aged from the computer’s resolved ARP table).  While Process Monitor could not identify the process sending out ARP packets, it did identify the programs that were running / accessing ARP.EXE (which was discussed in part two of this article series – the ARP command does not send the ARP packets, but when executed will list MAC addresses that are associated with IP addresses); a process named Lenovo.Modern.ImController.PluginHost.CompanionApp.exe (the only entry with a “Process Create” operation entry) and NortonSecurity.exe as shown in the second screen capture below (the Process Monitor log file became corrupt somehow by putting the computer to sleep with that log file still loaded in the program, but fortunately I had exported the log file to a CSV file and loaded that file into Excel to search for processes accessing ARP, so a screen capture of the information from Excel is shown below).  In Process Monitor I located one of the entries for Lenovo.Modern.ImController.PluginHost.CompanionApp.exe, and double-clicked that entry – the displayed information is shown in the third screen capture below.  From the Command Line that process appeared to be related to Lenovo Wifi Security.  A bit of research indicated that the Lenovo Wifi Security software may come preinstalled on Lenovo laptops, and may not have a legitimate purpose for remaining installed on the computer.  Temporarily disabling that program on the computer eliminated the flood of ARP packets sent across the network.

Quick Overview of Wireshark Functionality

From quickly viewing the above screen captures, you may have noticed that not all screen captures show the same columns of information in the top section of the window for each captured packet.  While it may not be obvious, Wireshark allows quickly adding additional columns for each packet row simply by right-clicking an entry in the decoded packet area near the bottom of the Wireshark window, and selecting Apply as Column from the popup menu.  Once added as a column, left-clicking that column’s header permits sorting the packet rows in ascending or descending order by the values contained in the column.  The right-click menu also contains the Apply as Filter item, which may be used to restrict the packets displayed to just those packets that have a similar value for the right-clicked statistic value.

Wireshark is available pre-compiled for Windows and Mac, and the source code may be downloaded and compiled for various other operating systems; see the download link at the start of this article for access to the setup programs and/or source code.  When installing the Windows version of Wireshark, the default values suggested by the setup program are probably good enough for most purposes, although you may want to select Wireshark Desktop Icon under the Create Shortcuts heading.

After installing Wireshark, upon starting Wireshark the top-half of the Wireshark window will show previously opened or saved Wireshark captures, while the bottom-half of the window shows network activity for the various network interfaces installed in your computer.  Starting a Wireshark capture may be as simple as double-clicking one of the listed network interfaces that shows some activity in the graph to the right of the network interface name.  For the below screen capture I single-clicked the wireless network interface to indicate the connection that I would likely select to use for the packet capture.

Selecting Options… from the Capture menu also allows starting a packet capture using a specific network interface installed in your computer, and also offers the ability to enable or disable promiscuous mode network packet captures.  Normally, network interfaces will ignore all (non-broadcast) network traffic that is NOT specifically directed to the network interface’s MAC or IP address.  Enabling promiscuous mode captures allows Wireshark to also show and log those packets that are NOT specifically directed to the network interface’s MAC or IP address; note that an upstream network switch, when functioning normally, will typically filter out most network traffic that is not specifically for the network interface’s MAC and/or IP address, unless the computer’s network port in the switch is configured as a “monitor port”, in which case a copy of all traffic passing through the switch would be also sent to the device connected to that “monitor port”.  The actual availability of promiscuous mode captures on wireless networks may be dependent on the wireless card drivers installed, so it may not be possible to capture all transmitted wireless traffic using Nagios on a computer connected wirelessly through a wireless access point.

To end a Wireshark capture, select Stop from the Capture menu, or click the red square toolbar button near the left of the toolbar (the shark-fin shaped button at the far-left of the toolbar may be used to start a capture in addition to the other methods previously mentioned).  Once capturing stops, Wireshark applies the configured color rules to the packets in the top section of the window to help draw attention to entries that may be of interest.  The color-coded vertical bar at the right of the window shows color codes of packets that were captured immediately before or after the displayed packets, allowing you to quickly jump to other potentially interesting sections of the packet log by clicking an area on that vertical bar.  The bottom half of the window shows the decoded information for the selected network packet, along with the raw bytes and ASCII character representation of the selected network packet.

By right-clicking a packet, and then selecting an item from the Follow submenu it is possible to easily apply a filter to only show the packets that are directly related to the selected packet, as well as an attempted ASCII character representation of the bytes in the packets.

The screen capture below shows the result of selecting TCP Stream from the menu for the [TCP Out-Of-Order] packet – this technique might be used to view the SQL statements sent to a database server, and the results sent back from the server.  Note that tcp.stream eq 110 appears near the top-left of the window – that is the automatically created and applied filter.  To again display all captured packets, click the X button at the far right of the green filter text area:

When troubleshooting problems with the help of a Wireshark packet capture, recognizing potential problem patterns in the packet capture may be difficult.  Recognizing patterns in the packet captures is often a significant step toward finding and fixing the source of a particular problem, so learning to recognize patterns is important.  As shown in the screen captures above, Wireshark makes effective use of color coding to draw attention to certain captured packets.  Wireshark’s Expert Information option on the Analyze menu processes the current packet capture, looking for network packets of potential interest, and then groups those packets by category.  IMPORTANT: Always make certain that the packet capture is saved before using the Expert Information option, as Wireshark will sometimes silently crash when summarizing the network packets.  [Note that if Wireshark does crash before you have saved a packet capture, the packet capture might have been written to a file with a name such as “wireshark_Wi-Fi 2CLAIW0.pcapng” (where “Wi-Fi 2” is the name of the network interface that was selected for the capture, and “CLAIW0” is a randomized set of characters) in your configured temp files folder, so you may be able to recover an unsaved Wireshark capture simply by renaming and then opening that file.]  Click the right pointing arrow next to a group name to see a summary of the packets that fall into the group, and click one of those packets to scroll the main Wireshark window to that packet.

Filters

As mentioned earlier, it is possible to right-click an entry in the decoded packet information to apply a filter which then causes Wireshark to only show other captured packets that have the same decoded value.  You are also able to manually create a filter specification by typing into the textbox that is located just below the toolbar.  As you type, Wireshark will change the background color of that textbox to either red or green depending on whether or not the filter is executable, and suggestions will be offered as you type to help you complete the filter so that it may be executed.  A couple of example filters are shown below:

Find packets set to or received from the IP address 23.38.130.175 (one of the IP addresses for support.infor.com):

ip.src == 23.38.130.175 || ip.dst == 23.38.130.175

Find ARP broadcast packets that are sent by the device having the MAC address 70:b3:d5:80:61:a4 that do not have a destination MAC address of 0a:01:05:1c:d1:64:

((frame.protocols == "eth:ethertype:arp") && (eth.src == 70:b3:d5:80:61:a4)) && !(eth.dst == 0a:01:05:1c:d1:64)

Show only OSI layer 2 network packets (technically, ARP is layer 2, but this filter eliminates ARP packets):

not ip and not arp

For additional Wireshark filter examples, see the “Wireshark Tutorial: Display Filter Expressions”  article created by Brad Duncan, or review the Wireshark documentation.

Potentially Important Additional Columns

The first time Wireshark starts only a small number of columns of information will appear for each network packet.  Depending on the type of problems that you are investigating, you may need to add additional columns of information to quickly understand the reason why certain packets appear in the packet capture.  For the screen captures that follow, I selected a TCP type network packet to view the decoded information in that packet.  As mentioned previously in this article, by right-clicking information in the decoded packet, you are able to quickly add additional columns of information for each packet in the upper window area.

In the Frame section of the decoded packet, you may want to add a column for “Time delta from previous frame”.  If you are checking for performance / communication speed problems and you are able to sufficiently eliminate unrelated information from the packet capture using filters, adding this column may help you quickly zero in on sections of the communication with the longest unexpected delays as this column will show the amount of elapsed time between the packet and the displayed packet that was logged immediately before the packet.  If you are not able to filter out unrelated communication from the Wireshark output, you may want to show the “Time since previous frame in this TCP stream” (mentioned below) instead.

If you are troubleshooting packet retransmits found in the packet capture or suspected network loops, you may want to add a column for “Identification” and “Time to Live” in the Internet Protocol Version section.  Each packet sent should have a unique “Identification” value, even if the packet is intentionally retransmitted by one of the devices.  If retransmissions appear in the Wireshark capture with the same “Identification” value, that potentially indicates that a harmful network loop exists that is causing packets to be incorrectly repeated on the network.  The “Time to Live” value typically starts at either 255, 128, or 64, and is decreased by one for every router that the packet passes through.  So, if duplicate packets appear on the network with the same “Identification” value, but different “Time to Live” values, that could indicate a routing problem that is incorrectly reintroducing packets on the network after being sent through a router.

Adding columns for “Sequence Number”, “Next Sequence Number”, “Acknowledgement Number”, “Calculated window size”, and “Time since previous frame in this TCP stream” may be helpful when investigating either unexpected network slowness or retransmissions.  If a packet appears with the same “Sequence Number”, but with a different “Identification” value, that probably indicates that the sender did not receive an acknowledgement (ACK) for that packet within a certain time limit.  If a packet appears with the same “Sequence Number” and “Identification” value, that probably indicates that there is a network loop causing duplicate packets to appear on the network.  When a device sends an ACK, it actually specifies the “Next Sequence Number” that the other device may send, rather than referring to the “Sequence Number” of the received packet (for some reason I frequently forget this acknowledgement numbering approach, which is why I am mentioning it here).  The “Time since previous frame in this TCP stream” statistic indicates the amount of elapsed time that has passed since the previous packet in a specific TCP conversation.  This statistic is very helpful when it is not possible to fully filter potentially unrelated network traffic from the packet capture.  If an unexpectedly large value appears in the column when added, then the next investigative step is to check the packets that appear in the packet capture between the current packet and the previous packet in the TCP conversation to see if, for example, a DNS lookup failed, which resulted in an eventual WINS broadcast that failed.  If nothing obvious appears in the packet capture, then the problem may be application, computer, or user “think time” causing the delay.  If the “Time since previous frame in this TCP stream” statistic does not appear in the decoded packet, first verify that you have selected a TCP type packet, right-click in the decoded packet information, and select “Protocol Preferences” – “Calculate conversation timestamps” from the popup menu.

Analyzing Performance Problems with Statistics

A quick glance at the Statistics menu in Wireshark suggests that there is substantial functionality in Wireshark to help analyze various types of potential network problems.  In this article I will only scratch the surface of the items found in the Statistics menu.  Assume that I have the Visual EXE files installed locally on a computer, that computer is connected to the company network using a VPN connection over the internet, and the person using the computer would like to start Visual Manufacturing and then the Purchase Requisition Entry Visual module.  Running Visual modules over a VPN connection using this approach is not recommended as Visual modules tend to be very chatty over the network, and with relatively high latency internet connections, Visual modules may take several seconds to respond to even the simplest command.  Out of curiosity, I wondered if Wireshark could help indicate what is causing the slowness, so I created a packet capture of Visual Manufacturing starting (from a shortcut that provided the username and password), and then opening the Purchase Requisition Entry module when connected over a VPN connection.  My first thought when ending that capture was that a whole lot of network communication was captured in the 63 second log capture.

After ending the capture, I found the network packet where the client computer first connected to the database server, and clicked that packet to select it.  From the Statistics menu I selected “TCP Stream Graphs” and then “Time Sequence (Stevens)” from the submenu.  The below screen capture shows the graph that appeared, which shows the number of network packets transmitted over time.  Fast rises indicate a lot of network packets transmitted in a short period of time, while slow rises in the network packets indicate that some sort of delay was preventing continued rapid communication with the database server.

Next, from the Statistics menu I selected “Flow Graph”.  The flow graph shows a summary of the network communication that appeared in the network capture in an easily understood format by IP address, which might be helpful to determine what was happening on the network when the client’s communication with the database server slowed, as shown in the previous screen capture.  In the following screen capture I replaced the displayed IP addresses with the function of the device (ex: Visual Client, File Server, and DB Server).  As pictured, the first 1.08 seconds was spent accessing various Visual INI files stored on a server with three potentially problematic packets, there was a 0.2 second delay with no network activity, and then communication with the database server started.  By scrolling down and to the right, additional communication, potentially unrelated to the Visual startup test, is revealed.

A lot of helpful information, including sample network captures showing various types of network activity, is accessible from Wireshark’s Help menu.  Other resources that I have found useful include the following videos and reference material:

Do not expect to learn everything about Wireshark in a few hours or days.  Instead, learn the basics of using Wireshark, and learn about the more advanced functionality as needed when troubleshooting various network or performance problems.  Of course, if you prefer to binge watch instructional videos, there are a number of very good 70+ minute long videos on YouTube from previous Sharkfest conferences to enjoy.

————————————————————————————

VMIUGLIST and VMIUG-TEC Google Groups

Access to the VMIUGLIST and VMIUG-TEC Google Groups is restricted. Anyone who is a user of Infor Visual Manufacturing, works for Infor, or works for one of the official Infor sales channel partners will be authorized to access the groups.  To keep spam under control, we ask that when requesting to join the groups, the following information is provided:

  • The Visual version you currently use.
  • The company you are working for.
  • Your company email address.
  • Your position and why you wish to join.

The new groups may be found using the following Google search page (incidentally, once you are a member of the groups, including the phrase VMIUG in a search will also allow searching for content within the groups): https://groups.google.com/forum/#!search/vmiug





Network Monitoring Experimentations 7

19 08 2010

August 19, 2010

(Back to the Previous Post in the Series)

In the previous articles of this series we saw several cases where Wirehark and other utilities helped identify various types of network related problems, and those blog articles provided screen captures that should allow you to identify network patterns that might exist in your own enviroments.  The question likely remains: “But (as a DBA) will we ever use this information?”  A recent thread on the OTN forums suggests maybe… especially if someone has “optimized” the servers.  I contributed several responses in that thread.

In the forum thread the original poster mentions that he is experiencing performance problems when querying the database over a 1Mb/s WAN connection:

“For example, a query on a heavy records table that retrieves about 700 KB of information in 500 records takes us 3:31 minutes, which gives an average speed of downloading about 3-4 KB/s, though we have upload bandwidth in abundance to the server and more than enough download bandwith in the customer’s network. With this query specifically happens that the reception / transmission of data takes place at regular intervals, receiving a chunk of around 20 KB and waiting about 5 seconds to receive new data.”

Waiting 5 seconds to retrieve 20KB – maybe someone swapped the 1Mb/s connection for a 56Kb/s dial-up modem which should be able to download 5KB to 7KB per second.  Or, maybe there is a better explanation – network monitoring applied.  In the OTN thread Sybrand offered a very good suggestion to use a trace route (TRACERT), the output of which looked something like this:

----
 1  * * * Tiempo de espera agotado para esta solicitud.
 2  * * * Tiempo de espera agotado para esta solicitud.
 3  69 ms  69 ms  69 ms
 4  69 ms  70 ms  70 ms
 5  70 ms  72 ms  70 ms
 6  69 ms  70 ms  73 ms
 7  70 ms  70 ms  70 ms
 8  82 ms  88 ms  83 ms
 9  97 ms 102 ms  99 ms
10  98 ms  97 ms 101 ms
11  97 ms  98 ms  98 ms
12  99 ms  98 ms  98 ms
13 102 ms  98 ms  97 ms
14  98 ms  97 ms  97 ms
----

The two timeouts at the start of the output imply that there are two devices, very likely firewalls, that are in the network pathway closest to the computer that executed the command.  The times for a response from each hop on the way to the other side of the connection are between 70ms (0.070 seconds) and 100ms (0.100 seconds).  The high response times imply that the connection probably is not a T1 or similar high-cost 1Mb/s connection, and the low response times imply that the connection cannot be satellite based where 1000ms (1.000 second) to 2500ms (2.500 seconds) response times would be expected.  The original poster confirmed that the connection is a DSL connection – such a connection likely does not have a quality of service guarantee in the event that performance is very slow, or that would take top priority if 95% of the packets are dropped (possibly needing to be retransmitted multiple times).  The times reported by trace route are important – if a network packet is sent from the server to the client it will probably require between 0.035 and 0.500 seconds for the packet to reach the client, and an equal amount of time for the ACK confirmation from the client essentially stating that the packet was successfully received and that it is ready for the next set of packets.

Sybrand then offered another very good suggestion, enabling a 10046 trace (reference #2) while remotely querying the database.  A TKPROF summary of the 10046 trace file showed the following statistics:

Execution Plan Time:   0.013437 seconds
CPU Time:              0.01 seconds
Elapsed Time:        185.38 seconds
Rows:                500

Sever-side waits:              0.03 seconds
SQL*Net more data to client: 185.32 seconds, 5.13 seconds max wait, 373 waits
SQL*Net message from client:  10.50 seconds, 5.02 seconds max wait,   6 waits

If you have viewed many TKPROF summaries, or raw 10046 trace files, you would probably recognize that the array fetch size was probably set at 100, and also just how uncommon 5+ second wait are for the “SQL*Net more data to client” wait event.  Even though the Oracle Database documentation provides little information about this event, you should be able to recognize that something is very wrong with the 5.13 second maximum wait time.  In the opening quote of this blog article the OP mentioned that it required about 5 seconds to transmit 20KB over the WAN connection – could it be that one of these 5 second long “SQL*Net more data to client” waits happened during the transfer, making it appear that the average thoughput was roughly 4KB/s?

Now what, we have tried TRACERT and a TKPROF summary of a 10046 trace file.  Maybe manually analyze the raw 10046 trace file looking for patterns?  Or, we can try what I suggested in the tread – network monitoring with Wireshark.  Fortunately, the OP already had access to Wireshark, he just needed a little assistance working through the packet capture.  The OP mentioned:

“But we used Wireshark in the server side and we observed the strange behaviour I spoke earlier: several packets (all together around 20 KB) an then around 5 seconds of NOTHING, nor ACK neither dropped packets. I didn’t test the client side but I assume that’d be the same stuff.”

The important key to understanding what is happening is to determine when the 5 seconds of inactivity happened.  Typically, a computer will send two network packets to a remote device and then wait patiently for an acknowledgement (ACK) that the two packets were received before it sends the next two packets.  If an ACK is not received within 0.2 seconds, the sender will assume that the packets were lost in transit and resend the packets.  So, we need to examine when the 5 seconds of inactivity happened:

  • Immediately after the server sent two packets to the client computer.
  • Immediately after the server sent a single packet to the client computer.
  • Immediately after receiving an ACK packet from the client computer.

In the case of the first bullet point, that might tell us that something happened to the client computer or the DSL connection.  However, the server should have taken action after 0.2 seconds and attempted to send the packets again.  In the case of the second and third bullets, we probably need to look at either what is happening in the server (maybe it took 5 seconds to jump back on the CPU to fetch the next set of rows, but it seems like that would have shown in the TKPROF summary?) or the local network (maybe the switch port dropped out for 5 seconds, but that would have also affected the clients that are local to the server).

Signs of trouble:

“The network administrators from the housing service have reconfigured the interface to force 100 Mb full duplex and guess the time it takes now to the query that took 3:31 s before: 1:38 s!”

Someone has been “optimizing” the servers and/or the network.  What happens when a switch port is forced to 100Mb/s full duplex without touching the server, which by the way has a gigabit network interface?

http://www.cisco.com/en/US/tech/tk389/tk214/technologies_tech_note09186a0080094781.shtml#when:

“One of the most common causes of performance issues on 10/100 Mb Ethernet links occurs when one port on the link operates at half-duplex while the other port operates at full-duplex. This occurs when one or both ports on a link are reset and the auto-negotiation process does not result in both link partners having the same configuration. It also can occur when users reconfigure one side of a link and forget to reconfigure the other side. Both sides of a link should have auto-negotiation on, or both sides should have it off. Cisco recommends to leave auto-negotiation on for those devices compliant with 802.3u.”

http://en.wikipedia.org/wiki/Autonegotiation:

“A duplex mismatch occurs when two connected devices are configured in different duplex modes. This may happen for example if one is configured for autonegotiation while the other one has a fixed mode of operation that is full duplex (no autonegotiation). In such conditions, the autonegotiation device correctly detects the speed of operation, but is unable to correctly detect the duplex mode. As a result, it sets the correct speed but starts using the half duplex mode.”

http://docs.hp.com/en/601/autonegotiation.pdf:

“However, parallel detection cannot detect the duplex mode so the duplex mode will default to Half duplex. The resulting link configuration will be able to send and receive frames but performance will be very poor. The performance will be poor because the Full duplex MAC disables the carrier sense and collision detect circuitry. So, when i t has frames to transmit it will transmit irrespective of what the Half Duplex MAC is doing. This will cause collisions with the Full Duplex MAC not backing off.”

From one of my posts in the thread:

“Something seems to be very odd here. First, the two sides are connected by a high latency (Internet) connection with many hops, and an average ping time of about 0.1 seconds. You will probably find that the MTU size (consider this to be the maximum packet size before packet fragmentation) when jumping through the various routers is probably about 1500 bytes. If the network administrators have configured the servers and local switch to support jumbo frames (8KB packets, for instance), those jumbo frames will be split into multiple packets so that they are able to pass through the intermediate routers between the server and client computers (you might not see a problem on the server side while one might appear on the client side). You indicate that the time delta between packets is about 0.0001 seconds (0.1ms), which is significantly less than the ping times suggested by the trace route – how many of those packets appear together with a delay of 0.0001 seconds in between?

If you have a Nagle/delayed ACK problem where one side is set to send an ACK after 13 (or some other number more than 2) packets while the other side is set to send an ACK after the default of 2 packets, that might, in part, explain the 5 seconds where no packets appeared on the server side of the Wireshark capture (for reference, on a gigabit network, a 130MB file will typically transfer across the network in about 3 second. When a Nagle/delayed ACK problem is present, that same transfer requires roughly 45 minutes).

Your network administrators forced the servers to 100Mb/s and it improved performance? This very well could be just a coincidence. If the server is forced to 100Mb/s full duplex but the switch port is not also forced to 100Mb/s full duplex, in most cases the actual connection speed will fall back to the minimum possible configuration – 10Mb/s half duplex. I wonder if the network administrators previously forced the network switch port to 100Mb/s full duplex but forgot to do the same for the server? Still, at 10Mb/s half duplex, the speed (throughput) is probably faster than your WAN link, so that change should not have made much of a difference. The very low latency between the server and switch is still much lower than the latency that you reported with the trace route – and that is what is likely killing performance. I wonder if the network administrators also disabled jumbo frames when they forced the connection speed? Disabling jumbo frames might explain the sudden decrease in time.”

I mentioned falling back to 10Mb/s half duplex, while the other quotes above seem to indicate that the speed will fall back to 100Mb/s half duplex – maybe some of these switches now consider 100Mb/s half duplex as the minimum.  Even at 10Mb/s half duplex the speed should be enough to fully saturate the 1Mb/s WAN connection… until we consider latency and the ACK frequency.  If it takes an average of 0.9 seconds for the server to see the next request for the next two 1500 byte network packets, that just might slow down the connection a bit.  Someone might have the wise idea of configuring the server to send more than two packets, maybe 16, at a time before waiting for the client to send back a single ACK packet.  Sounds like a good idea, right?  In part three of this series I showed why that might not be such a good idea – especially if only one side of the connection is configured for the modified ACK frequency.  That could never happen in a production environment, right?

The OP provided a screen capture of Wireshark running on his server.  I have attempted to make the screen capture a bit more generic and have marked up the screen capture:

I provided the following comment:

“That screen capture (assuming that it was captured from the server) shows that the server sent 16 packets in rapid fire and then waited. 78ms later [indicated by the green boxes in the screen capture] (roughly the ping time that you reported based on the TRACERT output) your server received 8 ACK packets – one for every 2 packets sent by the server (the client side is not configured to ACK after every 8 or 16 packets as the server appears to be configured). You will see a somewhat similar screen capture here where I “optimized” the ACK frequency on one side of the connection with a large array fetch size.

If I am reading the Wireshark screen capture correctly, the 5 second delay [the blue boxes in the screen capture show the roughly 5 second delays] happened after the server received 8 ACK packets (ACKs for the previous 16 packets that the server sent), and before the server continued to send the next batch of packets to the client. The problem appears to be on the server side of the connection – the server is probably waiting for another packet or a timeout. The typical timeout permitted to wait for an expected packet is 0.2 seconds, much less than the 5 second delay. Is it possible that someone “optimized” some of the other network parameters on the Windows server?” 

This OP asked why it appears that the client sent back 8 ACK packets in rapid fire to the server, this is my response:

“It is an illusion that all of the ACK packets are being sent together – well, not really. The ACK packet is a little bit like a method of controlling the speed of a transfer: the sender promises not to send more than (by default) two network packets at a time before receiving confirmation that the two packets were safely received by the sender. If the sender does not receive an ACK packet typically within 0.2 seconds it assumes that the packets were lost in transmission and attempts to resend the packets. Your server is sending out 16 packets before pausing to wait for the return of the single ACK packet to acknowledge that the previous 16 packets were received. The client computer at the other end of the DSL line receives the first two packets 0.04 seconds after the server sent them and replies with an ACK for those two packets. It then immediately sees the next two packets and replies with an ACK for those two. It then immediately sees the next two packets are replies with an ACK for those two. This continues until the client receives the 15th and 16th packets (a Wireshark capture on the client should show this). The server sees the 8 ACK packets for every 2 packets sent coming in all at once (0.04 seconds after being sent by the client), when it is expecting a single ACK for the previous 16 packets. I assume that this is at least part of the cause for the 5 second pauses.”

The OP provided a screen capture of his client when it submitted the same SQL statement (likely from a different time period), and mentioned that there are a lot of checksum errors appearing on the client side:

This is my response from the thread:

“The header checksum errors are caused by TCP checksum offloading. You will typically receive that message when a better quality network card is installed in the computer – the packet checksum calculation/verification is handled by the CPU in the network card itself, rather than relying on the host computer’s CPU. I think that one of my blog articles mentions this feature. More information may be found here.

That is interesting that you periodically see 4 packets arriving with a single ACK packet sent in return – possibly some of the packets were received by the client out of order, so the client had to wait for the expected packet to be received. The typical pattern for a database sending packets seems to be one or more TCP packets with “[TCP segment of a ressembled PDU]” in the info column followed by a single TNS packet with “Response, Data (6), Data” in the info column – that TNS packet seems to marks the end of the array fetch (although I may need to perform additional analysis). Based on some of the tests that I have performed with varying the array fetch size, if you cannot track down why the ACK frequency is wrong, you may be able to work around the issue by setting the array fetch size much smaller.

In the above blog article, you can compare the “Optimized” transfer speed as the fetch array size is increased, and also compare it with the unoptimized (no one has changed the ACK frequency) transfer. You will see that as the fetch array size increases, the ACK frequency drops off – this is a setup where I modified the ACK frequency on the client, while I believe that the ACK frequency of your server was adjusted.”

The OP also stated that on the server-side of the network he tried a couple of network transfer performance tests.  He tried copying 640MB of data and saw estimated completion times of 140 to 250 minutes.  He then changed a setting on the server’s network interface and the time dropped to about 4 minutes.  In the forum thread I mentioned the following:

“Something is wrong with the above. On a 100Mb/s connection, I am able to copy a 129MB file to a laptop in about 18 seconds. That suggests a 10Mb/s connection (which you will see when the server and switch are not set to the same speed – if not auto-negotiate) would be able to transfer roughly 12.9MB in the same 18 seconds. Doing the math:

129MB in 18 seconds at 100Mb/s, 12.9MB in 18 seconds at 10Mb/s
640MB/12.9 * 18 = 893 seconds = 14.9 minutes to transfer 640MB over a 10Mb/s connection.
(or 12.9MB/18 = 0.7166MB/s, 640MB / 0.7166MB ~ 893 seconds)

Or another way, assuming that the real-world speed of a 10Mb/s is close to 7Mb/s:

640MB / 7mb/s * 8bits = 731.43 seconds = 12.2 minutes to transfer 640MB over a 10Mb/s connection.

You should not be seeing 140 or 250 minutes (unless your server is not sending ACK correctly, in which case Wireshark would show periodic 0.2 second delays) – it should be close to 15 minutes.”

—-

As I mentioned, this OTN thread is interesting, providing us an opportunity to use some of the tools mentioned in earlier blog articles to try to find a resolution to a particularily troubling problem found in the wild.  While the complete cause of the 5 second delay problem has need yet been discovered, it should be an interesting thread to follow.  My vote on the cause of the problem is a combination of delayed ACKs due to a configuration change on the server (problem not yet found in the Windows registry), out of order receipt of packets, the server becoming confused at receiving 8 ACK packets when it expected 1, and a couple of network hardware configuration issues (maybe even the use of jumbo packets that are being fragmented) such as having to conform to the 1476 byte MTU of the DSL connection when the network local to the server is configured for a larger MTU size.





Network Monitoring Experimentations 6

14 05 2010

May 14, 2010

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

The last Network Monitoring Experimentations article encountered an unexpected snag – a SQL*Net optimization decreased the average number of bytes needed to transmit each row from about 889 bytes to roughly 86 bytes.  While the SQL*Net optimization is great from a performance standpoint (see the summary at the end of this article), it caused problems for some of the intended analysis in the previous article.  So, for a proper comparison, we will slightly change the inserted data when creating table T6 so that column C2 stores unique values (two random letters followed by 98 letter A characters):

CREATE TABLE T6 (
  C1 VARCHAR2(10),
  C2 VARCHAR2(100),
  PRIMARY KEY (C1));

INSERT INTO T6 NOLOGGING
SELECT
  'A'||TO_CHAR(ROWNUM,'0000000'),
  RPAD(DBMS_RANDOM.STRING('A',2),100,'A')
FROM
  DUAL
CONNECT BY
  LEVEL<=1000000;

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T6',CASCADE=>TRUE)

We will take the script used in the previous Network Monitoring blog article and modify it to point to the new table T6, and then append the original script onto the end of the modified script.  The modified portion of the script that accesses table T6 looks like this:

SET TIMING ON

SET AUTOTRACE TRACEONLY STATISTICS

SET ARRAYSIZE 10

SELECT /*+ USE_NL(T61 T62 T63 T64 T65 T66 T67 T68) */
  T61.C1,
  T61.C2,
  T62.C1,
  T62.C2,
  T63.C1,
  T63.C2,
  T64.C1,
  T64.C2,
  T65.C1,
  T65.C2,
  T66.C1,
  T66.C2,
  T67.C1,
  T67.C2,
  T68.C1,
  T68.C2
FROM
  T6 T61,
  T6 T62,
  T6 T63,
  T6 T64,
  T6 T65,
  T6 T66,
  T6 T67,
  T6 T68
WHERE
  T61.C1=T62.C1
  AND T61.C1=T63.C1
  AND T61.C1=T64.C1
  AND T61.C1=T65.C1
  AND T61.C1=T66.C1
  AND T61.C1=T67.C1
  AND T61.C1=T68.C1
  AND T61.C1 BETWEEN 'A 0001000' AND 'A 1000000';

host sleep 5

SET ARRAYSIZE 50

SELECT /*+ USE_NL(T61 T62 T63 T64 T65 T66 T67 T68) */
  T61.C1,
  T61.C2,
  T62.C1,
  T62.C2,
  T63.C1,
  T63.C2,
  T64.C1,
  T64.C2,
  T65.C1,
  T65.C2,
  T66.C1,
  T66.C2,
  T67.C1,
  T67.C2,
  T68.C1,
  T68.C2
FROM
  T6 T61,
  T6 T62,
  T6 T63,
  T6 T64,
  T6 T65,
  T6 T66,
  T6 T67,
  T6 T68
WHERE
  T61.C1=T62.C1
  AND T61.C1=T63.C1
  AND T61.C1=T64.C1
  AND T61.C1=T65.C1
  AND T61.C1=T66.C1
  AND T61.C1=T67.C1
  AND T61.C1=T68.C1
  AND T61.C1 BETWEEN 'A 0001000' AND 'A 1000000';

host sleep 5

SET ARRAYSIZE 100

SELECT /*+ USE_NL(T61 T62 T63 T64 T65 T66 T67 T68) */
  T61.C1,
  T61.C2,
  T62.C1,
  T62.C2,
  T63.C1,
  T63.C2,
  T64.C1,
  T64.C2,
  T65.C1,
  T65.C2,
  T66.C1,
  T66.C2,
  T67.C1,
  T67.C2,
  T68.C1,
  T68.C2
FROM
  T6 T61,
  T6 T62,
  T6 T63,
  T6 T64,
  T6 T65,
  T6 T66,
  T6 T67,
  T6 T68
WHERE
  T61.C1=T62.C1
  AND T61.C1=T63.C1
  AND T61.C1=T64.C1
  AND T61.C1=T65.C1
  AND T61.C1=T66.C1
  AND T61.C1=T67.C1
  AND T61.C1=T68.C1
  AND T61.C1 BETWEEN 'A 0001000' AND 'A 1000000';

host sleep 5

SET ARRAYSIZE 500

SELECT /*+ USE_NL(T61 T62 T63 T64 T65 T66 T67 T68) */
  T61.C1,
  T61.C2,
  T62.C1,
  T62.C2,
  T63.C1,
  T63.C2,
  T64.C1,
  T64.C2,
  T65.C1,
  T65.C2,
  T66.C1,
  T66.C2,
  T67.C1,
  T67.C2,
  T68.C1,
  T68.C2
FROM
  T6 T61,
  T6 T62,
  T6 T63,
  T6 T64,
  T6 T65,
  T6 T66,
  T6 T67,
  T6 T68
WHERE
  T61.C1=T62.C1
  AND T61.C1=T63.C1
  AND T61.C1=T64.C1
  AND T61.C1=T65.C1
  AND T61.C1=T66.C1
  AND T61.C1=T67.C1
  AND T61.C1=T68.C1
  AND T61.C1 BETWEEN 'A 0001000' AND 'A 1000000';

host sleep 5

SET ARRAYSIZE 1000

SELECT /*+ USE_NL(T61 T62 T63 T64 T65 T66 T67 T68) */
  T61.C1,
  T61.C2,
  T62.C1,
  T62.C2,
  T63.C1,
  T63.C2,
  T64.C1,
  T64.C2,
  T65.C1,
  T65.C2,
  T66.C1,
  T66.C2,
  T67.C1,
  T67.C2,
  T68.C1,
  T68.C2
FROM
  T6 T61,
  T6 T62,
  T6 T63,
  T6 T64,
  T6 T65,
  T6 T66,
  T6 T67,
  T6 T68
WHERE
  T61.C1=T62.C1
  AND T61.C1=T63.C1
  AND T61.C1=T64.C1
  AND T61.C1=T65.C1
  AND T61.C1=T66.C1
  AND T61.C1=T67.C1
  AND T61.C1=T68.C1
  AND T61.C1 BETWEEN 'A 0001000' AND 'A 1000000';

host sleep 5

SET ARRAYSIZE 5000

SELECT /*+ USE_NL(T61 T62 T63 T64 T65 T66 T67 T68) */
  T61.C1,
  T61.C2,
  T62.C1,
  T62.C2,
  T63.C1,
  T63.C2,
  T64.C1,
  T64.C2,
  T65.C1,
  T65.C2,
  T66.C1,
  T66.C2,
  T67.C1,
  T67.C2,
  T68.C1,
  T68.C2
FROM
  T6 T61,
  T6 T62,
  T6 T63,
  T6 T64,
  T6 T65,
  T6 T66,
  T6 T67,
  T6 T68
WHERE
  T61.C1=T62.C1
  AND T61.C1=T63.C1
  AND T61.C1=T64.C1
  AND T61.C1=T65.C1
  AND T61.C1=T66.C1
  AND T61.C1=T67.C1
  AND T61.C1=T68.C1
  AND T61.C1 BETWEEN 'A 0001000' AND 'A 1000000';

Just to keep things interesting, the laptop used in the previous article is now connected through an 802.11N wireless connection (gigabit uplink to the server), rather than an 802.11G wireless connection.  So, what is the output of the new combined script?  Once again, the server is running Oracle Database 11.2.0.1 and the client is a dual core 2.0GHz Core 2 Duo laptop (the client CPU may be a critical factor).

Table T6:

SQL> SET ARRAYSIZE 10

Elapsed: 00:04:49.97

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
     112906  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
  896904186  bytes sent via SQL*Net to client                                  
    1099259  bytes received via SQL*Net from client                            
      99902  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 50

Elapsed: 00:02:48.73

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      35563  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
  889871426  bytes sent via SQL*Net to client                                  
     220139  bytes received via SQL*Net from client                            
      19982  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed
---
SQL> SET ARRAYSIZE 100

Elapsed: 00:02:43.94

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      25896  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
  888992556  bytes sent via SQL*Net to client                                  
     110249  bytes received via SQL*Net from client                            
       9992  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 500

Elapsed: 00:02:34.81

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      18162  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
  888291260  bytes sent via SQL*Net to client                                  
      22337  bytes received via SQL*Net from client                            
       2000  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 1000

Elapsed: 00:02:33.64

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      17195  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
  888205848  bytes sent via SQL*Net to client                                  
      11348  bytes received via SQL*Net from client                            
       1001  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 5000

Elapsed: 00:02:32.59

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      16421  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
  888153854  bytes sent via SQL*Net to client                                  
       2548  bytes received via SQL*Net from client                            
        201  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    

At an array fetch size of 10 the client is receiving about 855.35MB, and at an array fetch size of 5,000 the client is receiving about 847.01MB.

Let’s contrast the above with the test for table T5.

Table T5:

SQL> SET ARRAYSIZE 10

Elapsed: 00:02:01.55

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
     112906  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
   94508146  bytes sent via SQL*Net to client                                  
    1099259  bytes received via SQL*Net from client                            
      99902  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 50

Elapsed: 00:00:56.37

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      35564  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
   87635026  bytes sent via SQL*Net to client                                  
     220139  bytes received via SQL*Net from client                            
      19982  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 100

Elapsed: 00:00:40.77

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      25896  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
   86775886  bytes sent via SQL*Net to client                                  
     110249  bytes received via SQL*Net from client                            
       9992  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 500

Elapsed: 00:00:27.31

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      18162  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
   86088574  bytes sent via SQL*Net to client                                  
      22337  bytes received via SQL*Net from client                            
       2000  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 1000

Elapsed: 00:00:27.03

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      17195  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
   86002660  bytes sent via SQL*Net to client                                  
      11348  bytes received via SQL*Net from client                            
       1001  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 5000

Elapsed: 00:00:24.64

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      16421  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
   85933880  bytes sent via SQL*Net to client                                  
       2548  bytes received via SQL*Net from client                            
        201  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    

At an array fetch size of 10, for table T5 the client is receiving about 90.13MB (855.35MB for table T6), and at an array fetch size of 5,000 the client is receiving about 81.95MB (847.01MB for table T6).  Incidentally, the client’s CPU utilization fluctuated between 50% and 99% utilization, indicating that the client’s dual core CPU may have played a limiting factor in the transfer speed.

—-

You must be wondering what kind of an impact the 802.11N wireless had on the overall transfer time.  During several of the transfers the Windows Vista Resource Meter indicated that the transfer speed was between 45Mb/s and 48Mb/s with the maximum network transfer capacity indicated as 33% utilized (this upper limit of maximum network transfer capacity did not change when connected to an 802.11G network).

The following execution statistics show the output when the same laptop was connected to the network on a 100Mb/s connection (the maximum supported by the laptop):

Table T6:

SQL> SET ARRAYSIZE 10

Elapsed: 00:02:09.54

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
     112906  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
  896904186  bytes sent via SQL*Net to client                                  
    1099260  bytes received via SQL*Net from client                            
      99902  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 50

Elapsed: 00:01:33.00

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      35563  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
  889871426  bytes sent via SQL*Net to client                                  
     220140  bytes received via SQL*Net from client                            
      19982  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 100

Elapsed: 00:01:29.23

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      25896  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
  888992556  bytes sent via SQL*Net to client                                  
     110250  bytes received via SQL*Net from client                            
       9992  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 500

Elapsed: 00:01:25.08

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      18162  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
  888291260  bytes sent via SQL*Net to client                                  
      22338  bytes received via SQL*Net from client                            
       2000  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 1000

Elapsed: 00:01:24.84

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      17195  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
  888205848  bytes sent via SQL*Net to client                                  
      11349  bytes received via SQL*Net from client                            
       1001  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 5000

Elapsed: 00:01:24.83

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      16421  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
  888153854  bytes sent via SQL*Net to client                                  
       2549  bytes received via SQL*Net from client                            
        201  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    

Table T5:

SQL> SET ARRAYSIZE 10

Elapsed: 00:00:56.28

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
     112906  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
   94508146  bytes sent via SQL*Net to client                                  
    1099260  bytes received via SQL*Net from client                            
      99902  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 50

Elapsed: 00:00:28.36

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      35564  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
   87635026  bytes sent via SQL*Net to client                                  
     220140  bytes received via SQL*Net from client                            
      19982  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 100

Elapsed: 00:00:24.19

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      25896  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
   86775886  bytes sent via SQL*Net to client                                  
     110250  bytes received via SQL*Net from client                            
       9992  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 500

Elapsed: 00:00:18.37

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      18162  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
   86088574  bytes sent via SQL*Net to client                                  
      22338  bytes received via SQL*Net from client                            
       2000  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 1000

Elapsed: 00:00:17.95

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      17195  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
   86002660  bytes sent via SQL*Net to client                                  
      11349  bytes received via SQL*Net from client                            
       1001  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 5000

Elapsed: 00:00:17.67

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      16421  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
   85933880  bytes sent via SQL*Net to client                                  
       2549  bytes received via SQL*Net from client                            
        201  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    

The above output indicates that the lower latency 100Mb/s wired connection did decrease the transfer time so that the transfer of result set from table T6 (no SQL*Net compression) – the array fetch setting of 10 required roughly the same amount of time as the transfer of the result set from table T5 (with SQL*Net compression) when the laptop was connected wirelessly.  The CPU utilization of the client also dropped slightly – possibly because the laptop no longer needed to perform the WPA2 encryption/decryption of the network packets (the wireless network is protected by WPA2 encryption). 

—-

But what about a gigabit connection – would that help?  Unfortunately, the laptop does not support gigabit network speeds.  With the wired connection topping out at about 83Mb/s to 87Mb/s and the CPU utilization running as high as it was, could we actually achieve a faster transfer speed?  To find out, we will switch to a slower desktop computer that is running a 3.8GHz Pentium 4 CPU with hyperthreading enabled.  This client, like the server, has a gigabit network card and probably supports CPU offloading of the TCP/IP checksums.  So, how well will this second client perform with our test script?

Table T6:

SQL> SET ARRAYSIZE 10

Elapsed: 00:00:56.29

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
     112906  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
  896904186  bytes sent via SQL*Net to client                                  
    1099260  bytes received via SQL*Net from client                            
      99902  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 50 

Elapsed: 00:00:30.12

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      35563  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
  889871426  bytes sent via SQL*Net to client                                  
     220140  bytes received via SQL*Net from client                            
      19982  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 100

Elapsed: 00:00:27.75

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      25896  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
  888992556  bytes sent via SQL*Net to client                                  
     110250  bytes received via SQL*Net from client                            
       9992  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 500

Elapsed: 00:00:26.21

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      18162  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
  888291260  bytes sent via SQL*Net to client                                  
      22338  bytes received via SQL*Net from client                            
       2000  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 1000

Elapsed: 00:00:25.87

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      17195  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
  888205848  bytes sent via SQL*Net to client                                  
      11349  bytes received via SQL*Net from client                            
       1001  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 5000

Elapsed: 00:00:26.71

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      16421  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
  888153854  bytes sent via SQL*Net to client                                  
       2549  bytes received via SQL*Net from client                            
        201  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    

The above showed that the gigabit connection helped significantly.  The gigabit performance results for table T5 follow.

Table T5:

SQL> SET ARRAYSIZE 10

Elapsed: 00:00:47.17

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
     112906  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
   94508146  bytes sent via SQL*Net to client                                  
    1099260  bytes received via SQL*Net from client                            
      99902  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---

SQL> SET ARRAYSIZE 50

Elapsed: 00:00:19.45

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      35564  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
   87635026  bytes sent via SQL*Net to client                                  
     220140  bytes received via SQL*Net from client                            
      19982  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 100

Elapsed: 00:00:16.28

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      25896  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
   86775886  bytes sent via SQL*Net to client                                  
     110250  bytes received via SQL*Net from client                            
       9992  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 500

Elapsed: 00:00:12.98

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      18162  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
   86088574  bytes sent via SQL*Net to client                                  
      22338  bytes received via SQL*Net from client                            
       2000  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 1000

Elapsed: 00:00:12.68

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      17195  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
   86002660  bytes sent via SQL*Net to client                                  
      11349  bytes received via SQL*Net from client                            
       1001  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 5000

Elapsed: 00:00:13.29

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      16421  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
   85933880  bytes sent via SQL*Net to client                                  
       2549  bytes received via SQL*Net from client                            
        201  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    

The above shows that the older client computer with the gigabit network card continued to decrease the transfer time up to the array fetch size of 1,000, and then the time increased slightly with the jump to the array fetch size of 5,000.

Just how fast was the transfer speed on the slower client computer?  I monitored the performance using a program called NetPerSec which was developed by PC Magazine several years ago.  Keep an eye on the right side of the graphs for the current transfer speeds.

ARRAYSIZE 10 – roughly 135.3Mb/s receive, 3.4Mb/s send:

ARRAYSIZE 100 – roughly 275.0Mb/s receive, 5.8Mb/s send:

ARRAYSIZE 500 – roughly 295.6Mb/s receive, 5.8Mb/s send:

ARRAYSIZE 1000 – roughly 295.9Mb/s receive, 5.8Mb/s send:

ARRAYSIZE 5000 – roughly 294.9Mb/s receive, 5.7Mb/s send:

Will it make any difference if we completely eliminate the network – running the SQL statement directly on the server?

Table T6:

SQL> SET ARRAYSIZE 10

Elapsed: 00:00:21.73

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
     112906  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
  906694810  bytes sent via SQL*Net to client                                  
    1099420  bytes received via SQL*Net from client                            
      99902  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 50

Elapsed: 00:00:14.92

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      35563  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
  891829890  bytes sent via SQL*Net to client                                  
     220300  bytes received via SQL*Net from client                            
      19982  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 100

Elapsed: 00:00:14.25

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      25896  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
  889972000  bytes sent via SQL*Net to client                                  
     110410  bytes received via SQL*Net from client                            
       9992  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 500

Elapsed: 00:00:13.47

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      18162  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
  888487488  bytes sent via SQL*Net to client                                  
      22498  bytes received via SQL*Net from client                            
       2000  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 1000

Elapsed: 00:00:13.43

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      17195  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
  888304174  bytes sent via SQL*Net to client                                  
      11509  bytes received via SQL*Net from client                            
       1001  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 5000

Elapsed: 00:00:13.46

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      16421  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
  888173808  bytes sent via SQL*Net to client                                  
       2709  bytes received via SQL*Net from client                            
        201  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    

Table T5:

SQL> SET ARRAYSIZE 10

Elapsed: 00:00:15.60

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
     112906  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
  104198871  bytes sent via SQL*Net to client                                  
    1099420  bytes received via SQL*Net from client                            
      99902  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 50

Elapsed: 00:00:09.40

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      35564  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
   89573511  bytes sent via SQL*Net to client                                  
     220300  bytes received via SQL*Net from client                            
      19982  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 100

Elapsed: 00:00:10.06

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      25896  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
   87745341  bytes sent via SQL*Net to client                                  
     110410  bytes received via SQL*Net from client                            
       9992  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 500

Elapsed: 00:00:07.14

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      18162  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
   86282805  bytes sent via SQL*Net to client                                  
      22498  bytes received via SQL*Net from client                            
       2000  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 1000

Elapsed: 00:00:06.77

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      17195  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
   86099988  bytes sent via SQL*Net to client                                  
      11509  bytes received via SQL*Net from client                            
       1001  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    
---
SQL> SET ARRAYSIZE 5000

Elapsed: 00:00:06.66

Statistics
----------------------------------------------------------                     
          0  recursive calls                                                   
          0  db block gets                                                     
      16421  consistent gets                                                   
          0  physical reads                                                    
          0  redo size                                                         
   85953635  bytes sent via SQL*Net to client                                  
       2709  bytes received via SQL*Net from client                            
        201  SQL*Net roundtrips to/from client                                 
          0  sorts (memory)                                                    
          0  sorts (disk)                                                      
     999001  rows processed                                                    

Running the test directly on the server had a significant impact on the execution time.

Summarizing the performance results:

802.11G – Laptop and wireless access point set to 802.11G (54Mb/s)
802.11N – Laptop and wireless access point set to 802.11N speed
Netbook N – Netbook with Atom N450 CPU and 802.11N speed
100Mb/s – Laptop with wired 100Mb/s network connection
1000Mb/s – Desktop with wired gigabit network connection
On Server – Executed directly on the server’s console (no network)

The Windows Vista client has a built in Resource Monitor that helps when it is necessary to associate CPU usage, disk performance, network transfer speed, and memory utilization.  The following screen capture was generated during the 802.11G transfer test with the array fetch size set to 500:

The Windows 7 client also has a built in Resource Monitor that helps when it is necessary to associate CPU usage, disk performance, network transfer speed, and memory utilization.  The following screen capture was generated during the Netbook N transfer test with the array fetch size set to 500 – notice the CPU usage for the System Interrupts is shown as consuming 30% of the available CPU time.  However, because the netbook has a single core hyperthreaded CPU, the actual utilization for System Interrupts might actually be twice as high (60%) due to hyperthreading simulating the existence of a second CPU:

While the above tests may have limited value for your specific project, the tests do show the potential impact of SQL*Net compression, the performance impact of different network communication links, and the potential impact the network communication has on the client’s CPU utilization.





Network Monitoring Experimentations 5

27 04 2010

April 27, 2010 (Updated May 13, 2010)

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

Previous articles in this series showed a lot of interesting capabilities that Wireshark and similar packet capture utilities offer, such as seeing the behind the scenes SQL statements that do not appear in a 10046 trace when Autotrace is enabled in SQL*Plus.  But there is more that Wireshark offers to assist with analysis of Oracle data transfers across the network.  Today’s article explores some of these additional features while experimenting with different array fetch sizes using Oracle Database 11.2.0.1 with default listener settings on the server side, and Oracle client 11.1.0.7 on the client-side, with an 802.11g wireless network between the two computers (greater network packet latency than a wired connection).

The test table and SQL statement used in this article is similar to those used in this article, which demonstrated that the execution plan on Oracle Database 11.2.0.1 differed from that used on earlier releases, and that difference may affect your ability to reproduce the test results for today’s article if you are using an older release of Oracle Database.

CREATE TABLE T5 (
  C1 VARCHAR2(10),
  C2 VARCHAR2(100),
  PRIMARY KEY (C1));

INSERT INTO T5 NOLOGGING
SELECT
  'A'||TO_CHAR(ROWNUM,'0000000'),
  RPAD('A',100,'A')
FROM
  DUAL
CONNECT BY
  LEVEL<=1000000;

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T5',CASCADE=>TRUE)

The test script that will be used is as follows, executed after Wireshark packet capturing is enabled on the client computer:

SET AUTOTRACE TRACEONLY STATISTICS

SET ARRAYSIZE 10

SELECT /*+ USE_NL(T51 T52 T53 T54 T55 T56 T57 T58) */
  T51.C1,
  T51.C2,
  T52.C1,
  T52.C2,
  T53.C1,
  T53.C2,
  T54.C1,
  T54.C2,
  T55.C1,
  T55.C2,
  T56.C1,
  T56.C2,
  T57.C1,
  T57.C2,
  T58.C1,
  T58.C2
FROM
  T5 T51,
  T5 T52,
  T5 T53,
  T5 T54,
  T5 T55,
  T5 T56,
  T5 T57,
  T5 T58
WHERE
  T51.C1=T52.C1
  AND T51.C1=T53.C1
  AND T51.C1=T54.C1
  AND T51.C1=T55.C1
  AND T51.C1=T56.C1
  AND T51.C1=T57.C1
  AND T51.C1=T58.C1
  AND T51.C1 BETWEEN 'A 0001000' AND 'A 1000000';

host sleep 5

SET ARRAYSIZE 50

SELECT /*+ USE_NL(T51 T52 T53 T54 T55 T56 T57 T58) */
  T51.C1,
  T51.C2,
  T52.C1,
  T52.C2,
  T53.C1,
  T53.C2,
  T54.C1,
  T54.C2,
  T55.C1,
  T55.C2,
  T56.C1,
  T56.C2,
  T57.C1,
  T57.C2,
  T58.C1,
  T58.C2
FROM
  T5 T51,
  T5 T52,
  T5 T53,
  T5 T54,
  T5 T55,
  T5 T56,
  T5 T57,
  T5 T58
WHERE
  T51.C1=T52.C1
  AND T51.C1=T53.C1
  AND T51.C1=T54.C1
  AND T51.C1=T55.C1
  AND T51.C1=T56.C1
  AND T51.C1=T57.C1
  AND T51.C1=T58.C1
  AND T51.C1 BETWEEN 'A 0001000' AND 'A 1000000';

host sleep 5

SET ARRAYSIZE 100

SELECT /*+ USE_NL(T51 T52 T53 T54 T55 T56 T57 T58) */
  T51.C1,
  T51.C2,
  T52.C1,
  T52.C2,
  T53.C1,
  T53.C2,
  T54.C1,
  T54.C2,
  T55.C1,
  T55.C2,
  T56.C1,
  T56.C2,
  T57.C1,
  T57.C2,
  T58.C1,
  T58.C2
FROM
  T5 T51,
  T5 T52,
  T5 T53,
  T5 T54,
  T5 T55,
  T5 T56,
  T5 T57,
  T5 T58
WHERE
  T51.C1=T52.C1
  AND T51.C1=T53.C1
  AND T51.C1=T54.C1
  AND T51.C1=T55.C1
  AND T51.C1=T56.C1
  AND T51.C1=T57.C1
  AND T51.C1=T58.C1
  AND T51.C1 BETWEEN 'A 0001000' AND 'A 1000000';

host sleep 5

SET ARRAYSIZE 500

SELECT /*+ USE_NL(T51 T52 T53 T54 T55 T56 T57 T58) */
  T51.C1,
  T51.C2,
  T52.C1,
  T52.C2,
  T53.C1,
  T53.C2,
  T54.C1,
  T54.C2,
  T55.C1,
  T55.C2,
  T56.C1,
  T56.C2,
  T57.C1,
  T57.C2,
  T58.C1,
  T58.C2
FROM
  T5 T51,
  T5 T52,
  T5 T53,
  T5 T54,
  T5 T55,
  T5 T56,
  T5 T57,
  T5 T58
WHERE
  T51.C1=T52.C1
  AND T51.C1=T53.C1
  AND T51.C1=T54.C1
  AND T51.C1=T55.C1
  AND T51.C1=T56.C1
  AND T51.C1=T57.C1
  AND T51.C1=T58.C1
  AND T51.C1 BETWEEN 'A 0001000' AND 'A 1000000';

host sleep 5

SET ARRAYSIZE 1000

SELECT /*+ USE_NL(T51 T52 T53 T54 T55 T56 T57 T58) */
  T51.C1,
  T51.C2,
  T52.C1,
  T52.C2,
  T53.C1,
  T53.C2,
  T54.C1,
  T54.C2,
  T55.C1,
  T55.C2,
  T56.C1,
  T56.C2,
  T57.C1,
  T57.C2,
  T58.C1,
  T58.C2
FROM
  T5 T51,
  T5 T52,
  T5 T53,
  T5 T54,
  T5 T55,
  T5 T56,
  T5 T57,
  T5 T58
WHERE
  T51.C1=T52.C1
  AND T51.C1=T53.C1
  AND T51.C1=T54.C1
  AND T51.C1=T55.C1
  AND T51.C1=T56.C1
  AND T51.C1=T57.C1
  AND T51.C1=T58.C1
  AND T51.C1 BETWEEN 'A 0001000' AND 'A 1000000';

host sleep 5

SET ARRAYSIZE 5000

SELECT /*+ USE_NL(T51 T52 T53 T54 T55 T56 T57 T58) */
  T51.C1,
  T51.C2,
  T52.C1,
  T52.C2,
  T53.C1,
  T53.C2,
  T54.C1,
  T54.C2,
  T55.C1,
  T55.C2,
  T56.C1,
  T56.C2,
  T57.C1,
  T57.C2,
  T58.C1,
  T58.C2
FROM
  T5 T51,
  T5 T52,
  T5 T53,
  T5 T54,
  T5 T55,
  T5 T56,
  T5 T57,
  T5 T58
WHERE
  T51.C1=T52.C1
  AND T51.C1=T53.C1
  AND T51.C1=T54.C1
  AND T51.C1=T55.C1
  AND T51.C1=T56.C1
  AND T51.C1=T57.C1
  AND T51.C1=T58.C1
  AND T51.C1 BETWEEN 'A 0001000' AND 'A 1000000';

Windows client computers do not have an operating system sleep command, so if the client is running on Windows, create a file named sleep.vbs in the current folder used by SQL*Plus with the following code:

Const lngSleepSecondsDefault = 600
Dim lngSleepSeconds
Dim objScriptArguments
Dim strArgument

Set objScriptArguments = WScript.Arguments

If objScriptArguments.Count = 0 then
  lngSleepSeconds = lngSleepSecondsDefault
Else
  For Each strArgument in objScriptArguments
    If IsNumeric(strArgument) Then
      lngSleepSeconds = cLng(strArgument)
      Exit For
    End If
  Next
  If lngSleepSeconds <= 0 Then
    lngSleepSeconds = lngSleepSecondsDefault
  End If
End If

WScript.Sleep lngSleepSeconds * 1000

Before enabling the Wireshark capture, execute the test SQL statement a couple of times to limit the potential effects of a cold cache and physical reads unfairly affecting the first execution with the small array fetch size once packet logging is enabled. Start the Wireshark capture on the client, and then execute the above test script in SQL*Plus on the client computer.  The client will fetch approximately 999,001 rows from the database, and will likely result in a Wireshark packet capture totalling about 600MB.  Later processing of the packet capture, as depicted later in this article, will certainly tax the memory limits of 32 bit operating systems, and may result in Wireshark crashing (the latest beta version of Wireshark was used in this test).  Additionally, processing of the resulting packet capture is extremely CPU intensive – you have been warned.

First, we will start the analysis by right-clicking one of the TNS type packets, and select Follow TCP Stream.  The resulting window that is eventually displayed will not be used, we are just using that feature to enter the Filter criteria shown near the top of the Wireshark window with the light green background in the picture below:

As shown above, when a line is checked in the lower pane, such as Time delta from previous captured frame, the status bar at the bottom of the window is updated with a field name, in this case frame.time_delta.  If you manually review the Time column in the Wireshark window you will see that the time delta between packets varies quite a bit.  I wonder if the time delta values vary depending on the array fetch size… we could extract the information and process it a spreadsheet package to determine whether the time deltas change.  More on this later.  The packet frame length for this packet is 704 bytes.

If we scroll down to the bottom of the lower pane, we see that Wireshark has attempted to decode this TNS packet, all except for decrypting the control characters and binary data in the data portion of the packet.

We see that the TNS packet length is 650 of the packet frame’s 704 bytes, and the actual data contained in the packet is 640 of those 650 bytes.  The status bar shows that the field name for the data length is data.len – we will use this information later.  The data section shows that this is the submission of the first SQL statement in our test script.

The following screenshot is showing where the server is returning the data to the client early during the execution of the test script.  In this case, with the array fetch size set to 10, the server had to break the selected data into two network packets, with the first containing bytes 0 through 1459, and the second containing bytes 1460 through 1760 (just 301 bytes in the second packet).

Scrolling down near the end of the packet capture, we see one of the network packets where the array fetch size of 5,000 was in use.  The displayed time delta is 0.000341 seconds, which is very good for a wireless network.

Moving down to the next packet, we find that this packet is 855 bytes in length, but the total TCP length is 8,155 bytes, compared to the value 1,761 that we found when the array fetch size of 10 was in use.

The field name for the reassembled TCP length, as displayed in the status bar, is tcp.reassembled.length – we will use this later.

Now for the exciting part, graphing the result.  If you had to make a choice of which array fetch size to use: 10, 50, 100, 500, 1000, or 5000, which should you use, and why?  From the Statistics menu in Wireshark, select the IO Graphs menu item.  You should see something like this if you set the Tick Interval to 10 seconds and the Pixels per tick to 10.  The Tick Interval value determines how the statistics will be grouped together, in this case it is 10 second intervals.  The Pixels per tick value determines how horizontally spread out, the distance between the data items when displayed on screen.

The first bump width in the values represents the time required to retrieve the data 10 rows at a time.  The second bump width represents the time required to retrieve 50 rows at a time.  The third bump width represents the time required to retrieve 100 rows at a time.  The remaining bump widths show the time required to fetch 500, 1000, and 5000 rows at a time, respectively.  The height of the bumps, in this case, indicates the number of packets transmitted between the server and the client in each of the 10 second time intervals.  When the client retrieved 10 rows at a time, only about 50% of the number of network packets flowed between the server and client as when the client retrieved 50 or 100 rows at a time – this of course increased the time required to transmit all of the requested rows.

By changing Packets/Tick to Bytes/Tick we are able to determine how much data (including overhead) is actually contained in the packets flowing across the network every 10 seconds.  While the bumps in the previous screenshot showed that roughly the same number of packets flowed across the network for the array fetch sizes of 50 and 100, this screenshot shows that the amount of usable data (plus overhead) in the packets was about 20% less – this explains why the bump representing the array fetch size of 50 is about 20% longer than the bump representing the array fetch size of 100.

Interesting, but why were there only about half as many packets transferred every 10 seconds when the array fetch size was set to 10 compared to when it was set to 50?  Did I hear someone say “wait and see”, or was it “latency”?  🙂  If we change the Units field to Advanced we are able to find out what is happening.  For the Filter of Graph 1 I specified tcp.stream eq 1 (the same filter as was entered automatically when we earlier selected to Follow the TCP Stream).  For the Calc entry I selected Max(*) and then in the next field to the right I specified frame.time_delta_displayed – we saw earlier where this field was obtained in the main Wireshark window.  I then set the scale to 100000 so that I was able to see both the peaks and valleys in the graph.  The graph shows that there is a maximum latency between packets of about 50ms when the array fetch size was set to 10, and that maximum latency dropped significantly until the array fetch size increased to 1,000.  With the array fetch size set to 5,000 the maximum latency is nearly identical to that of the maximum latency when the array fetch size was set to 10 – at this point the client is essentially choking on all of the data that it is receiving, it is taking the client roughly 50ms to process the rows and send the next fetch request to the server.

Neat, what else are we able to do?  This time we will turn off the first graph an create two additional graphs.  The first of the additional graphs shows the sum of the sizes of the TNS section of the network packets for each of the 10 second time intervals, while the second of the additional graphs shows the maximum data length in the TNS section – how much actual data is stuffed into the network packets (per the configured Oracle SDU, and may actually be broken into multiple packets).  A logarithmic scale is selected for the data since the sum of the TNS section lengths will be considerably larger than the maximum data length in any given 10 second time interval.  The green line, representing the amount of actual data in the packets, shows that in a best case scenario, when the array fetch size is set to 10 all of the packets are leaving without being full, or more correctly, the SDU size is lower – so, not only are half as many packets transmitted per second, but in a best case each packet also contains less data.  The situation improves a bit when the fetch array size increased to 50, and then stabilized at a higher value when the fetch array size was bumped to 100.

The previous graph is somewhat useful, yet at the same time somewhat useless.  Rather than looking at the maximum of how full the data packets were in each 10 second time interval, it is probably more useful to look at how full the packets were on average.  Well, the following graph does not actually show that – it shows the amount of data Oracle is packaging before transmission, as controlled by the session data unit (SDU) size, which typically has a maximum default value of 8KB.  The following modified version of the previous chart shows how the average SDU size of the packets changed as the array fetch size increased.  The effective average SDU size doubled as the array fetch size increased from 100 to 500 – does that mean that the transfer completed twice as fast?  No, as shown in the previous graphs, but it might have made a greater difference with a gigabit wired network.

Wireshark also provides the option of controlling the graph type.  In the following screenshot I added the tcp.reassembled.length as Graph 4, changed Graph 2 to an Impulse style (vertical line) and changed Graph 3 to FBar style (vertical rectangle).  Notice that the tcp.reassembled.length values are greater when the array fetch size is 100 than when it is at any other value.  I believe that the reason why this is the case is because on average there was a greater percentage of “overhead” bytes in the TNS section of the packets when array fetching 100 rows at a time when compared to array fetching, for example, 500 rows at a time.

With the above in mind, what is the ideal array fetch size for this SQL statement?  Stop playing with your data packets and answer the question.  🙂

——–

Added May 13, 2010:

First 4 packets retrieving rows after the initial single row fetch, fetch array size set to 10, response to 3 fetch calls (click to zoom):

First 4 packets retrieving rows after the initial single row fetch, fetch array size set to 5,000, response to 1 fetch call (click to zoom):





Network Monitoring Experimentations 4

25 04 2010

April 25, 2010

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

Previous articles in this series showed a number of interesting troubleshooting techniques that take advantage of the Wireshark packet capture utility.  A recent OTN thread made me wonder if there isn’t a lot more than Wireshark offers, which might be helpful to Oracle DBAs.  To demonstrate, first we need to create a simple test table with 1,000 rows:

CREATE TABLE T10(
  C1 NUMBER,
  C2 VARCHAR2(12),
  C4 CHAR(12),
  C3 DATE);

INSERT INTO
  T10
SELECT
  ROWNUM,
  'A'||TO_CHAR(ROWNUM)||'Z',
  'A'||TO_CHAR(ROWNUM)||'Z',
  SYSDATE+ROWNUM/10
FROM
  DUAL
CONNECT BY
  LEVEL<=1000;

COMMIT;

SET ARRAYSIZE 100

To begin the test, we will enable packet logging in Wireshark, then execute the following in our SQL*Plus session:

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'AUTOTRACE_STATISTICS_TRACE';

SET AUTOTRACE TRACEONLY STATISTICS

SELECT
  *
FROM
  T10
ORDER BY
  C1;

DISCONNECT ALL;

For my test execution, the following was displayed in SQL*Plus:

1000 rows selected.

Statistics
---------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
      33470  bytes sent via SQL*Net to client
        459  bytes received via SQL*Net from client
         11  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
       1000  rows processed

It might be interesting to determine how SQL*Plus was able to indicate the number of consistent gets, sorts, and various other information.  First, let’s take a look at the raw 10046 trace file that was generated:

EXEC #5:c=0,e=21,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=0,tim=491526459579
WAIT #5: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526460488
WAIT #5: nam='SQL*Net message from client' ela= 3932 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526464434
CLOSE #5:c=0,e=3,dep=0,type=1,tim=491526464492
=====================
PARSING IN CURSOR #5 len=40 dep=0 uid=194 oct=3 lid=194 tim=491526464547 hv=3933222116 ad='36cab5360' sqlid='dyk4dprp70d74'
SELECT DECODE('A','A','1','2') FROM DUAL
END OF STMT
PARSE #5:c=0,e=29,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1388734953,tim=491526464546
EXEC #5:c=0,e=14,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1388734953,tim=491526464596
WAIT #5: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526464624
FETCH #5:c=0,e=9,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,plh=1388734953,tim=491526464651
STAT #5 id=1 cnt=1 pid=0 pos=1 obj=0 op='FAST DUAL  (cr=0 pr=0 pw=0 time=0 us cost=2 size=0 card=1)'
WAIT #5: nam='SQL*Net message from client' ela= 1641 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526466337
CLOSE #5:c=0,e=6,dep=0,type=1,tim=491526466385
=====================
PARSING IN CURSOR #5 len=33 dep=0 uid=194 oct=3 lid=194 tim=491526466449 hv=2185052098 ad='36c33c460' sqlid='bzmk73q13ugy2'
SELECT DISTINCT SID FROM V$MYSTAT
END OF STMT
PARSE #5:c=0,e=27,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2877173184,tim=491526466448
EXEC #5:c=0,e=35,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2877173184,tim=491526466521
WAIT #5: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526466548
WAIT #5: nam='asynch descriptor resize' ela= 1 outstanding #aio=0 current aio limit=4294967295 new aio limit=129 obj#=-1 tim=491526467243
FETCH #5:c=0,e=705,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,plh=2877173184,tim=491526467269
STAT #5 id=1 cnt=1 pid=0 pos=1 obj=0 op='HASH UNIQUE (cr=0 pr=0 pw=0 time=0 us cost=1 size=21 card=1)'
STAT #5 id=2 cnt=588 pid=1 pos=1 obj=0 op='FIXED TABLE FULL X$KSUMYSTA (cr=0 pr=0 pw=0 time=821 us cost=0 size=630 card=30)'
STAT #5 id=3 cnt=1 pid=2 pos=1 obj=0 op='FIXED TABLE FULL X$KSUSGIF (cr=0 pr=0 pw=0 time=0 us cost=0 size=4 card=1)'
WAIT #5: nam='SQL*Net message from client' ela= 2708 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526470036
CLOSE #5:c=0,e=7,dep=0,type=1,tim=491526470085
*** SESSION ID:(1229.11429) 2010-04-25 10:13:17.869
*** SERVICE NAME:(SYS$USERS) 2010-04-25 10:13:17.869
*** MODULE NAME:(SQL*Plus) 2010-04-25 10:13:17.869
*** ACTION NAME:() 2010-04-25 10:13:17.869

=====================
PARSING IN CURSOR #5 len=298 dep=0 uid=194 oct=3 lid=194 tim=491526480837 hv=3566218912 ad='36cdd1b90' sqlid='f7dctcva90bp0'
SELECT STATISTIC# S, NAME FROM SYS.V_$STATNAME WHERE NAME IN ('recursive calls','db block gets','consistent gets','physical reads','redo size','bytes sent via SQL*Net to client','bytes received via SQL*Net from client','SQL*Net roundtrips to/from client','sorts (memory)','sorts (disk)') ORDER BY S
END OF STMT
PARSE #5:c=0,e=24,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=601351913,tim=491526480836
EXEC #5:c=0,e=16,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=601351913,tim=491526480950
WAIT #5: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526480978
FETCH #5:c=0,e=14,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,plh=601351913,tim=491526481009
WAIT #5: nam='SQL*Net message from client' ela= 1461 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526482489
WAIT #5: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526482574
FETCH #5:c=0,e=248,p=0,cr=0,cu=0,mis=0,r=9,dep=0,og=1,plh=601351913,tim=491526482778
STAT #5 id=1 cnt=10 pid=0 pos=1 obj=0 op='FIXED TABLE FULL X$KSUSD (cr=0 pr=0 pw=0 time=0 us cost=0 size=360 card=10)'
WAIT #5: nam='SQL*Net message from client' ela= 4778 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526487592
*** SESSION ID:(1229.11429) 2010-04-25 10:13:17.869

CLOSE #5:c=0,e=8,dep=0,type=1,tim=491526491403
=====================
PARSING IN CURSOR #5 len=35 dep=0 uid=194 oct=3 lid=194 tim=491526491485 hv=410341182 ad='36c4497b0' sqlid='1y2frjhc7amty'
SELECT
  *
FROM
  T10
ORDER BY
  C1
END OF STMT
PARSE #5:c=0,e=26,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4242492972,tim=491526491484
EXEC #5:c=0,e=12,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4242492972,tim=491526491548
WAIT #5: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526491574
WAIT #5: nam='Disk file operations I/O' ela= 121 FileOperation=2 fileno=8 filetype=2 obj#=-1 tim=491526491726
FETCH #5:c=0,e=446,p=0,cr=7,cu=0,mis=0,r=1,dep=0,og=1,plh=4242492972,tim=491526492037
WAIT #5: nam='SQL*Net message from client' ela= 3403 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526495465
WAIT #5: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526495529
FETCH #5:c=0,e=65,p=0,cr=0,cu=0,mis=0,r=100,dep=0,og=1,plh=4242492972,tim=491526495589
WAIT #5: nam='SQL*Net message from client' ela= 3025 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526498633
WAIT #5: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526498679
FETCH #5:c=0,e=81,p=0,cr=0,cu=0,mis=0,r=100,dep=0,og=1,plh=4242492972,tim=491526498754
WAIT #5: nam='SQL*Net message from client' ela= 2924 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526501694
WAIT #5: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526501739
FETCH #5:c=0,e=82,p=0,cr=0,cu=0,mis=0,r=100,dep=0,og=1,plh=4242492972,tim=491526501815
WAIT #5: nam='SQL*Net message from client' ela= 2873 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526504704
WAIT #5: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526504749
FETCH #5:c=0,e=82,p=0,cr=0,cu=0,mis=0,r=100,dep=0,og=1,plh=4242492972,tim=491526504825
WAIT #5: nam='SQL*Net message from client' ela= 2952 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526507793
WAIT #5: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526507839
FETCH #5:c=0,e=82,p=0,cr=0,cu=0,mis=0,r=100,dep=0,og=1,plh=4242492972,tim=491526507914
WAIT #5: nam='SQL*Net message from client' ela= 3453 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526511383
WAIT #5: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526511433
FETCH #5:c=0,e=107,p=0,cr=0,cu=0,mis=0,r=100,dep=0,og=1,plh=4242492972,tim=491526511536
WAIT #5: nam='SQL*Net message from client' ela= 2857 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526514412
WAIT #5: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526514453
FETCH #5:c=0,e=53,p=0,cr=0,cu=0,mis=0,r=100,dep=0,og=1,plh=4242492972,tim=491526514501
WAIT #5: nam='SQL*Net message from client' ela= 2812 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526517332
WAIT #5: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526517377
FETCH #5:c=0,e=58,p=0,cr=0,cu=0,mis=0,r=100,dep=0,og=1,plh=4242492972,tim=491526517429
WAIT #5: nam='SQL*Net message from client' ela= 2897 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526520366
WAIT #5: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526520412
FETCH #5:c=0,e=58,p=0,cr=0,cu=0,mis=0,r=100,dep=0,og=1,plh=4242492972,tim=491526520463
WAIT #5: nam='SQL*Net message from client' ela= 2849 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526523328
WAIT #5: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526523374
FETCH #5:c=0,e=67,p=0,cr=0,cu=0,mis=0,r=99,dep=0,og=1,plh=4242492972,tim=491526523435
STAT #5 id=1 cnt=1000 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=7 pr=0 pw=0 time=285 us cost=4 size=31000 card=1000)'
STAT #5 id=2 cnt=1000 pid=1 pos=1 obj=82545 op='TABLE ACCESS FULL T10 (cr=7 pr=0 pw=0 time=249 us cost=3 size=31000 card=1000)'
WAIT #5: nam='SQL*Net message from client' ela= 4630 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=491526528115
*** SESSION ID:(1229.11429) 2010-04-25 10:13:17.947

XCTEND rlbk=0, rd_only=1, tim=491526563841
=====================
PARSING IN CURSOR #6 len=447 dep=1 uid=0 oct=2 lid=0 tim=491526563926 hv=1097020010 ad='36cc895f8' sqlid='f711myt0q6cma'
insert into sys.aud$( sessionid,entryid,statement,ntimestamp#, userid,userhost,terminal,action#,returncode, logoff$lread,logoff$pread,logoff$lwrite,logoff$dead, logoff$time,comment$text,spare1,clientid,sessioncpu,proxy$sid,user$guid, instance#,process#,auditid,dbid) values(:1,:2,:3,SYS_EXTRACT_UTC(SYSTIMESTAMP),     :4,:5,:6,:7,:8,     :9,:10,:11,:12,     cast(SYS_EXTRACT_UTC(systimestamp) as date),:13,:14,:15,:16,:17,:18,     :19,:20,:21,:22)
END OF STMT
PARSE #6:c=0,e=18,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=0,tim=491526563926
EXEC #6:c=0,e=67,p=0,cr=1,cu=2,mis=0,r=1,dep=1,og=4,plh=0,tim=491526564082
STAT #6 id=1 cnt=0 pid=0 pos=1 obj=0 op='LOAD TABLE CONVENTIONAL  (cr=1 pr=0 pw=0 time=0 us)'
CLOSE #6:c=0,e=4,dep=1,type=1,tim=491526564123
WAIT #0: nam='log file sync' ela= 33 buffer#=3872 sync scn=1305056145 p3=0 obj#=-1 tim=491526564191
CLOSE #2:c=0,e=6,dep=0,type=0,tim=491526564244
CLOSE #5:c=0,e=3,dep=0,type=0,tim=491526564260
CLOSE #4:c=0,e=4,dep=0,type=0,tim=491526564271
CLOSE #3:c=0,e=9,dep=0,type=0,tim=491526564287
CLOSE #1:c=0,e=3,dep=0,type=0,tim=491526564298

With the help of these articles, 10046 Extended SQL Trace Interpretation and 10046 Extended SQL Trace Interpretation 2, we are able to decode most of the above trace file output that was generated by Oracle Database 11.2.0.1.  For example, we see these helpful SQL statement before the SQL statement that we executed:

SELECT DECODE('A','A','1','2') FROM DUAL

SELECT DISTINCT SID FROM V$MYSTAT

SELECT STATISTIC# S, NAME FROM SYS.V_$STATNAME WHERE NAME IN
  ('recursive calls','db block gets','consistent gets','physical reads','redo size',
  'bytes sent via SQL*Net to client','bytes received via SQL*Net from client',
  'SQL*Net roundtrips to/from client','sorts (memory)','sorts (disk)') ORDER BY S

After the SQL statement finished executing the execution plan for the SQL statement was written to the trace file, a COMMIT was executed that did not affect any data, and then this SQL statement appeared:

insert into sys.aud$( sessionid,entryid,statement,ntimestamp#, userid,userhost,
terminal,action#,returncode, logoff$lread,logoff$pread,logoff$lwrite,logoff$dead,
logoff$time,comment$text,spare1,clientid,sessioncpu,proxy$sid,user$guid, instance#,
process#,auditid,dbid) values(:1,:2,:3,SYS_EXTRACT_UTC(SYSTIMESTAMP),
:4,:5,:6,:7,:8,     :9,:10,:11,:12,     cast(SYS_EXTRACT_UTC(systimestamp) as date),
:13,:14,:15,:16,:17,:18,     :19,:20,:21,:22)

The 10046 trace file successfully hid how SQL*Plus was able to determine the number of consistent gets, sorts, and various other information for the SQL statement.  It must be magic?  Or maybe something else is happening… such as the creation of a secondary session to retrieve the statistics.  Maybe Wireshark is able to help answer the question (note that some information in the Wireshark output is obscured – possibly you would not want someone seeing that information, or for that matter any of this “clear text” information, which would be clearly visible to all computers on a wireless network, as well as on networks that utilize hubs rather than switches, if the Oracle Advanced Security Option is not purchased and enabled).

There were two unrelated packets at the start of the Wireshark capture, so I right-clicked the first packet of interest and set it to be the Time Reference packet.  The data section of the first packet of interest shows the ALTER SESSION command that enabled the 10046 trace:

 —

A couple of packets later we see the first SQL statement that was not explicitly executed in our test script:

Finally, packet 23 shows the SQL statement that we executed in our test script:

Packet 24 shows the retrieval of the first row from the database, on the lines between bytes 0150 (336 in decimal) and 0170 (368 in decimal).  Packet 25 contains the request for the next set of 100 rows, and packet 26 begins the transmission of those 100 rows.

Packet 75 shows another SQL statement that was not explicitly included in the test script – this SQL statement also was not included in the 10046 trace file.  The SQL statement retrieves the ending statistics for the session, allowing SQL*Plus to display the output of the delta values for those statistics.

One of the problems when reviewing the packet trace output, of course, is that it is difficult to see the full picture of what is happening.  Wireshark allows right-clicking one of the packets in the conversation and then offers the option of Follow TCP Stream.  For example, this is the Follow TCP Stream output at the start of the test script execution, with the red colored text showing the client-side transmitted data and the blue colored text showing the server transmitted data.

Scrolling down a little further we see the SQL statement used to determine the STATISTIC# values, followed by the SQL statement that was executed in the secondary session which determined the current statistic values for the session of interest.

At the end of the transfer we see the last row sent by the database server containing the row with a column value of A1000Z, immediately followed by the SQL statement that was executed in the secondary session which determined the ending statistic values for the session of interest:

There were a lot of . (period) characters in the Raw output, designating unprintable characters.  Might some of these unprintable characters be the missing number column, the missing date column, or various control characters?  Switching to the Hex Dump view might be helpful.  In the below picture we see the SQL statement from our script, followed by a single row returned from the database, followed by another fetch call from the client, followed by the next 100 rows from the database. 

How do I know that the client is retrieving the next 100 rows?  Take a close look at the number displayed to the right of 00000DAF, in position 00000DB0.  The hexidecimal number 64 (100 in decimal) appears in that position.  How do I know that this is the number of requested rows?  A simple test script will confirm that the number of requested rows appears in that byte position:

SET AUTOTRACE TRACEONLY STATISTICS

SET ARRAYSIZE 10

SELECT
  *
FROM
  T10
ORDER BY
  C1;

SET ARRAYSIZE 50

SELECT
  *
FROM
  T10
ORDER BY
  C1;

SET ARRAYSIZE 100

SELECT
  *
FROM
  T10
ORDER BY
  C1;

SET ARRAYSIZE 500

SELECT
  *
FROM
  T10
ORDER BY
  C1;

I saved the Hex Dump output from Wireshark’s follow stream to a file: NetworkMonitoringExperimentations4-FetchSize.txt (save with a .txt extension).  Feel free to confirm that the array fetch size is provided in that byte location.

ARRAYSIZE=10
00000324  00 15 00 00 06 00 00 00  00 00 03 05 3b 05 00 00 ........ ....;...
00000334  00 0a 00 00 00                                   .....
...
00000339  00 15 00 00 06 00 00 00  00 00 03 05 3c 05 00 00 ........ ....<...
00000349  00 0a 00 00 00                                   .....
...
...
ARRAYSIZE=50
00000F13  00 15 00 00 06 00 00 00  00 00 03 05 aa 05 00 00 ........ ........
00000F23  00 32 00 00 00                                   .2...
...
00000F28  00 15 00 00 06 00 00 00  00 00 03 05 ab 05 00 00 ........ ........
00000F38  00 32 00 00 00                                   .2...
...
...
ARRAYSIZE=100
00001472  00 15 00 00 06 00 00 00  00 00 03 05 c9 05 00 00 ........ ........
00001482  00 64 00 00 00                                   .d...
...
00001487  00 15 00 00 06 00 00 00  00 00 03 05 ca 05 00 00 ........ ........
00001497  00 64 00 00 00                                   .d...
...
...
ARRAYSIZE=500
000018FF  00 15 00 00 06 00 00 00  00 00 03 05 de 05 00 00 ........ ........
0000190F  00 f4 01 00 00                                   .....
...
00001914  00 15 00 00 06 00 00 00  00 00 03 05 df 05 00 00 ........ ........
00001924  00 f4 01 00 00                                   .....

The byte containing the array size indicator seems to be incorrect when the fetch array size is set to 500 – F4 is equivalent to the decimal number 244.  True, but F401, or with the high byte first, 01F4 is equivalent to the decimal number 500.

As a late edition to this blog article, I submitted the same SQL statement using a client that is able to retrieve the full fetch array size of 100 in the first fetch call.  Below is the packet capture for that execution – compare the packet capture with the last of the above pictures to see if you are able to determine where SQL*Plus specified that the array fetch size for the first fetch should be only 1, rather than the user specified 100.

A search through YouTube locates a number of helpful Wireshark videos, including this one that shows how to use the Follow Stream feature.

So, the question remains, how do we decode the rest of the conversation between the client and server?  For instance, what happened to the number and date column values in the packet capture?  Does this OTN tread provide a clue?





Network Monitoring Experimentations 3

5 01 2010

January 5, 2010

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

Previously, I showed a couple of examples of network communication working well, as well as a couple counter-examples where the network communication caused problems for Oracle Database communication.  Problems like those shown in the previous articles are not exclusive to just database communication, but instead potentially impact all network communication between two devices attached to the network.

Here is a question that I asked potential job candidates who were interviewing for a position at the company – not so much looking for a correct answer, but rather the thought process demonstrated by the person who was looking to be hired:

—————————————————————————————————–

Situation: All servers are connected to an HP 4160GL 60 port gigabit managed switch using CAT 5e cables.  A new Dell PowerEdge 2850 server running Windows 2003 Standard Edition and with all security updates is installed.  The PowerEdge 2850 indicated that its network card was connected at gigabit speeds, and the HP 4160GL switch also reported that the server was connected at gigabit speeds.  Copying files from the Dell PowerEdge 1750 server running Red Hat Enterprise Linux ES 3 Server running SAMBA was extremely slow.  The PowerEdge 1750 indicated that its network card was connected at gigabit speeds, and the HP 4160GL switch also reports that the server is connected at gigabit speeds.

Initial Analysis: Pull copying the Windows 2000 Service Pack 4 file, which is roughly 132MB in size, to the PowerEdge 2850 (Windows 2003) from the PowerEdge 1750 (RH Linux) using Windows Explorer required more than 45 minutes to complete.  Pull copying the same file to a five year old laptop with a 100Mbps connection running Windows 95 from the PowerEdge 1750 using Windows Explorer required roughly 22 seconds to complete (and 17 seconds on a desktop computer with a 100Mbps connection).  Pull copying the same file to a year old desktop computer with a Gigabit connection running Windows XP from the PowerEdge 1750 using Windows Explorer required roughly 4 seconds to complete.

What would you do to troubleshoot this problem?  What is the cause of the problem?  What is the solution?

This is the start of the problematic transfer, about 1.3 seconds after logging was initiated with Wireshark (actually Ethereal at the time) – IP address .47 is the Windows 2003 Server with Service Pack 1, IP address .46 is the Red Hat Enterprise Linux ES 3 Server w/ SAMBA:

Roughly 4 seconds into the copy, only about 200 packets were logged in the packet capture program:

The problem was corrected and then the transfer was captured again.  This is the start of the fixed run (about 1 second after logging was initiated) – IP address .47 is the Windows 2003 Server with Service Pack 1, IP address .46 is the Red Hat Enterprise Linux ES 3 Server w/ SAMBA:

Roughly 3.4 seconds after the file copy started, the file copy completed.  Roughly 80,745 packets were transmitted across the network during the copy operation:

—————————————————————————————————–

What would you do to troubleshoot this problem?  What is the cause of the problem?  What is the solution?





Network Monitoring Experimentations 2

16 12 2009

December 16, 2009

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

The first post in this series introduced Wireshark, demonstrated some of the problems that may be detected with the tool, demonstrated what happens with various fetch array sizes in the presence of configuration problems, and compared a low latency 100Mb connection with a higher latency wireless 802.11G (54Mb) connection from the same client computer during a transfer of database table data.

The connection attempt shown in the screen capture from the previous post was of the computer connecting through the wireless connection to a server during the launch of an ERP package. That connection attempt was slowed a bit by the network latency. That screen shot is shown below.

Why introduce the same screen shot again?  Let’s look at what happens sometimes when application versions are upgraded and in the process (sometimes undocumented) features are added to applications.  Take, for instance, this ERP program’s connection attempt that required at least 10 seconds to complete on a gigabit connection, when it is able to complete in less than a second on the much slower wireless connection.  What changed?  Taking a look at a couple of the screen shots of the slow connection attempt will help (the client computer is .185, the database server is .45, and the file server is .40):

In the above we see the client computer searching the file server for a file named DNSAPI.dll that did not exist.  Then we see a DNS lookup attempt for a device named TNS.k-mm.com (for those following along, this was caused by the TNS: keyword on the remotedbname= entry in the SQL.INI file).  Notice that the timestamp of the request is at 17:25:14.683144.  When the client is told by the DNS server for the second time that the name does not exist, the timestamp has advanced to 17:25:14.686069.  Next, the client tried to resolve the name TNS first by a direct WINS (NBNS) lookup and then by a WINS broadcast.  The timestamp of the next client activity after the WINS lookup is 17:25:16.967553, so roughly 2.3 seconds were wasted.  But, that is not the most significant lost time interval.

The above shows the actual connection starting at 17:25:16.988701 and completing at 17:25:17.001425, which is very quick, but are we done yet?

The above screen shot shows the client looking for a file named nethasp.ini on the file server, but is not able to find that file.  Important?

The above shows that the client finally gave up looking for the nethasp.ini file at 17:25:17:209969, and at 17:25:17:232671 started sending a series of four full broadcast packets, spaced roughly one second apart, to a destination port number that Wireshark has aliased as tcpnethaspsrv.  The client finally started communicating again with the database server at 17:25:21.344038.  So, between 17:25:14.686069 and 17:25:21.344038 the client mostly just sat idle waiting for a response that never came.  Yes, Wireshark ultimately identified the cause of the problem and helped find a course of action that was fully in place at 05:30:00 the next morning, with the new version of the ERP program starting in less than a second just like the old version.

Wireshark exposes a lot of detail from the packets that it captures.  Here are two screen shots of a network packet containing a SQL statement that was transmitted in clear text:

Notice that the raw packet data appears at the bottom and is expanded into human readable text in the upper pane.  Clicking a line in the upper pane (for instance one that shows the destination MAC address) shows that information in the raw packet data at the bottom of the window.





Network Monitoring Experimentations 1

15 12 2009

December 15, 2009 

(Forward to the Next Post in the Series)

Wireshark, formerly known as Ethereal, is a free network packet capture program for several operating system platforms.  Wireshark is able to reveal a wide range of network communication problems:
http://www.wireshark.org/download.html

The problems might include self-inflicted problems that result from implementing changes to improve one performance problem, as suggested here:
  http://www.dslreports.com/faq/tweaks 
or here:
  http://download.microsoft.com/download/2/8/0/2800a518-7ac6-4aac-bd85-74d2c52e1ec6/tuning.doc

The problems might include client computers with 100Mb network cards connected into gigabit switches, which then results in unexpected packet retransmits until the client computer is moved to a 100Mb switch that then connects to the gigabit switch (this is a rare problem, somewhere I have a Wireshark capture that shows this behavior).

The problems might include failing network equipment or bad network wiring or excessive EMF in the environment that distorts traffic on CAT 5e, CAT 6, and wireless connections.

The problems might include a client application that unexpectedly takes 10 to 20 seconds to “log in” when it should take 1 second or less.

The problems might include issues with inappropriate fetch array sizes, poor choices for SDU size, forcing jumbo TCP/IP frames through intermediate network hardware that does not support frame sizes larger than roughly 1500 bytes.

The problems might include a high latency network or WAN connections.

First, let’s look at a Wireshark capture of a successful connection attempt from a client computer connected to the network by an 802.11G (54Mb) wireless connection:

There is nothing terribly out of the ordinary with the above.  The client computer in packet 1 sent an ARP broadcast packet to the network asking for the MAC address of the network card on the network that is associated with IP address 192.185.10.52, and the response should be returned to the client computer at IP address 192.185.10.51.  Roughly 0.002 seconds later the client computer attempted to connect to the database server using the TNS protocol.  Roughly 0.1 seconds later the connection completed.  Roughly 0.06 seconds after the connection attempt finished, the client computer started sending queries to the database server.  There were a couple of delays between the submission of the SQL statement and the response from the server, such as the 0.12 second delay between packet 26 and 27, but nothing significant.

It is quite possible that network problems will occur, as in the following:

In the above, notice that the server (IP address 192.185.10.52) is resending packets that it assumed were lost in transit due to the long delays between packets without receiving an ACK from the client computer (the ACK packet may have been lost).  Notice also the long delays between packets that might either be a symptom of network problems, or CPU/Wait Event that could be captured in a 10046 extended SQL trace.

Next, let’s take a look at the effects of adjusting the fetch array size (number of rows retrieved in each fetch call – ARRAYSIZE setting in SQL*Plus) when executing a SQL statement in SQL*Plus that selects from a table having an average row length of 245 bytes, with the client on a wired 100Mb connection, and with the standard Oracle SDU size.  The server is still at IP address 192.185.10.52, the client computer (same as used above) is now at IP address 192.185.10.53.

Fetch Array Size 1:

Fetch Array Size 15:
 

Fetch Array Size 100:

Fetch Array Size 1000:

In the above, you might notice that after every two packets that are sent by the server, the client computer sends back an ACK packet – this is typical behavior.  So, what happens when someone “optimizes” the network card parameters?

Fetch Array Size 15 with “Optimized” ACK frequency (Same Data):

Fetch Array Size 100 with “Optimized” ACK frequency (Same Data):

In the above, notice the number of packets transmitted before the client sends an ACK packet, and typically just before the client sends the ACK, there is a delay of roughly 0.2 seconds.  OK, a little slower.  So, what happens when we switch from the table with the average row length of 245 bytes to a table containing roughly 1MB to 2MB JPEG pictures?  Compare how long it takes to reach the 35th packet in the following two screenshots:

Fetch Array Size 100:

Fetch Array Size 100 with “Optimized” ACK frequency (Same Data):

(Late Additions to the Post)

Fetch Array Size 100 – Table with Pictures (802.11G):

Fetch Array Size 100 – Table with Pictures, Optimized ACK (802.11G):

SQL*Plus SELECT from the table with the average row length of 245 bytes using the 802.11G (54Mb) connection:

Fetch Array Size 1 with “Optimized” ACK frequency:

Fetch Array Size 15 with “Optimized” ACK frequency (Same Data):

Fetch Array Size 100 with “Optimized” ACK frequency (Same Data):

Fetch Array Size 1000 with “Optimized” ACK frequency (Same Data):