Network Monitoring Experimentations 7

19 08 2010

August 19, 2010

(Back to the Previous Post in the Series)

In the previous articles of this series we saw several cases where Wirehark and other utilities helped identify various types of network related problems, and those blog articles provided screen captures that should allow you to identify network patterns that might exist in your own enviroments.  The question likely remains: “But (as a DBA) will we ever use this information?”  A recent thread on the OTN forums suggests maybe… especially if someone has “optimized” the servers.  I contributed several responses in that thread.

In the forum thread the original poster mentions that he is experiencing performance problems when querying the database over a 1Mb/s WAN connection:

“For example, a query on a heavy records table that retrieves about 700 KB of information in 500 records takes us 3:31 minutes, which gives an average speed of downloading about 3-4 KB/s, though we have upload bandwidth in abundance to the server and more than enough download bandwith in the customer’s network. With this query specifically happens that the reception / transmission of data takes place at regular intervals, receiving a chunk of around 20 KB and waiting about 5 seconds to receive new data.”

Waiting 5 seconds to retrieve 20KB – maybe someone swapped the 1Mb/s connection for a 56Kb/s dial-up modem which should be able to download 5KB to 7KB per second.  Or, maybe there is a better explanation – network monitoring applied.  In the OTN thread Sybrand offered a very good suggestion to use a trace route (TRACERT), the output of which looked something like this:

----
 1  * * * Tiempo de espera agotado para esta solicitud.
 2  * * * Tiempo de espera agotado para esta solicitud.
 3  69 ms  69 ms  69 ms
 4  69 ms  70 ms  70 ms
 5  70 ms  72 ms  70 ms
 6  69 ms  70 ms  73 ms
 7  70 ms  70 ms  70 ms
 8  82 ms  88 ms  83 ms
 9  97 ms 102 ms  99 ms
10  98 ms  97 ms 101 ms
11  97 ms  98 ms  98 ms
12  99 ms  98 ms  98 ms
13 102 ms  98 ms  97 ms
14  98 ms  97 ms  97 ms
----

The two timeouts at the start of the output imply that there are two devices, very likely firewalls, that are in the network pathway closest to the computer that executed the command.  The times for a response from each hop on the way to the other side of the connection are between 70ms (0.070 seconds) and 100ms (0.100 seconds).  The high response times imply that the connection probably is not a T1 or similar high-cost 1Mb/s connection, and the low response times imply that the connection cannot be satellite based where 1000ms (1.000 second) to 2500ms (2.500 seconds) response times would be expected.  The original poster confirmed that the connection is a DSL connection – such a connection likely does not have a quality of service guarantee in the event that performance is very slow, or that would take top priority if 95% of the packets are dropped (possibly needing to be retransmitted multiple times).  The times reported by trace route are important – if a network packet is sent from the server to the client it will probably require between 0.035 and 0.500 seconds for the packet to reach the client, and an equal amount of time for the ACK confirmation from the client essentially stating that the packet was successfully received and that it is ready for the next set of packets.

Sybrand then offered another very good suggestion, enabling a 10046 trace (reference #2) while remotely querying the database.  A TKPROF summary of the 10046 trace file showed the following statistics:

Execution Plan Time:   0.013437 seconds
CPU Time:              0.01 seconds
Elapsed Time:        185.38 seconds
Rows:                500

Sever-side waits:              0.03 seconds
SQL*Net more data to client: 185.32 seconds, 5.13 seconds max wait, 373 waits
SQL*Net message from client:  10.50 seconds, 5.02 seconds max wait,   6 waits

If you have viewed many TKPROF summaries, or raw 10046 trace files, you would probably recognize that the array fetch size was probably set at 100, and also just how uncommon 5+ second wait are for the “SQL*Net more data to client” wait event.  Even though the Oracle Database documentation provides little information about this event, you should be able to recognize that something is very wrong with the 5.13 second maximum wait time.  In the opening quote of this blog article the OP mentioned that it required about 5 seconds to transmit 20KB over the WAN connection – could it be that one of these 5 second long “SQL*Net more data to client” waits happened during the transfer, making it appear that the average thoughput was roughly 4KB/s?

Now what, we have tried TRACERT and a TKPROF summary of a 10046 trace file.  Maybe manually analyze the raw 10046 trace file looking for patterns?  Or, we can try what I suggested in the tread – network monitoring with Wireshark.  Fortunately, the OP already had access to Wireshark, he just needed a little assistance working through the packet capture.  The OP mentioned:

“But we used Wireshark in the server side and we observed the strange behaviour I spoke earlier: several packets (all together around 20 KB) an then around 5 seconds of NOTHING, nor ACK neither dropped packets. I didn’t test the client side but I assume that’d be the same stuff.”

The important key to understanding what is happening is to determine when the 5 seconds of inactivity happened.  Typically, a computer will send two network packets to a remote device and then wait patiently for an acknowledgement (ACK) that the two packets were received before it sends the next two packets.  If an ACK is not received within 0.2 seconds, the sender will assume that the packets were lost in transit and resend the packets.  So, we need to examine when the 5 seconds of inactivity happened:

  • Immediately after the server sent two packets to the client computer.
  • Immediately after the server sent a single packet to the client computer.
  • Immediately after receiving an ACK packet from the client computer.

In the case of the first bullet point, that might tell us that something happened to the client computer or the DSL connection.  However, the server should have taken action after 0.2 seconds and attempted to send the packets again.  In the case of the second and third bullets, we probably need to look at either what is happening in the server (maybe it took 5 seconds to jump back on the CPU to fetch the next set of rows, but it seems like that would have shown in the TKPROF summary?) or the local network (maybe the switch port dropped out for 5 seconds, but that would have also affected the clients that are local to the server).

Signs of trouble:

“The network administrators from the housing service have reconfigured the interface to force 100 Mb full duplex and guess the time it takes now to the query that took 3:31 s before: 1:38 s!”

Someone has been “optimizing” the servers and/or the network.  What happens when a switch port is forced to 100Mb/s full duplex without touching the server, which by the way has a gigabit network interface?

http://www.cisco.com/en/US/tech/tk389/tk214/technologies_tech_note09186a0080094781.shtml#when:

“One of the most common causes of performance issues on 10/100 Mb Ethernet links occurs when one port on the link operates at half-duplex while the other port operates at full-duplex. This occurs when one or both ports on a link are reset and the auto-negotiation process does not result in both link partners having the same configuration. It also can occur when users reconfigure one side of a link and forget to reconfigure the other side. Both sides of a link should have auto-negotiation on, or both sides should have it off. Cisco recommends to leave auto-negotiation on for those devices compliant with 802.3u.”

http://en.wikipedia.org/wiki/Autonegotiation:

“A duplex mismatch occurs when two connected devices are configured in different duplex modes. This may happen for example if one is configured for autonegotiation while the other one has a fixed mode of operation that is full duplex (no autonegotiation). In such conditions, the autonegotiation device correctly detects the speed of operation, but is unable to correctly detect the duplex mode. As a result, it sets the correct speed but starts using the half duplex mode.”

http://docs.hp.com/en/601/autonegotiation.pdf:

“However, parallel detection cannot detect the duplex mode so the duplex mode will default to Half duplex. The resulting link configuration will be able to send and receive frames but performance will be very poor. The performance will be poor because the Full duplex MAC disables the carrier sense and collision detect circuitry. So, when i t has frames to transmit it will transmit irrespective of what the Half Duplex MAC is doing. This will cause collisions with the Full Duplex MAC not backing off.”

From one of my posts in the thread:

“Something seems to be very odd here. First, the two sides are connected by a high latency (Internet) connection with many hops, and an average ping time of about 0.1 seconds. You will probably find that the MTU size (consider this to be the maximum packet size before packet fragmentation) when jumping through the various routers is probably about 1500 bytes. If the network administrators have configured the servers and local switch to support jumbo frames (8KB packets, for instance), those jumbo frames will be split into multiple packets so that they are able to pass through the intermediate routers between the server and client computers (you might not see a problem on the server side while one might appear on the client side). You indicate that the time delta between packets is about 0.0001 seconds (0.1ms), which is significantly less than the ping times suggested by the trace route – how many of those packets appear together with a delay of 0.0001 seconds in between?

If you have a Nagle/delayed ACK problem where one side is set to send an ACK after 13 (or some other number more than 2) packets while the other side is set to send an ACK after the default of 2 packets, that might, in part, explain the 5 seconds where no packets appeared on the server side of the Wireshark capture (for reference, on a gigabit network, a 130MB file will typically transfer across the network in about 3 second. When a Nagle/delayed ACK problem is present, that same transfer requires roughly 45 minutes).

Your network administrators forced the servers to 100Mb/s and it improved performance? This very well could be just a coincidence. If the server is forced to 100Mb/s full duplex but the switch port is not also forced to 100Mb/s full duplex, in most cases the actual connection speed will fall back to the minimum possible configuration – 10Mb/s half duplex. I wonder if the network administrators previously forced the network switch port to 100Mb/s full duplex but forgot to do the same for the server? Still, at 10Mb/s half duplex, the speed (throughput) is probably faster than your WAN link, so that change should not have made much of a difference. The very low latency between the server and switch is still much lower than the latency that you reported with the trace route – and that is what is likely killing performance. I wonder if the network administrators also disabled jumbo frames when they forced the connection speed? Disabling jumbo frames might explain the sudden decrease in time.”

I mentioned falling back to 10Mb/s half duplex, while the other quotes above seem to indicate that the speed will fall back to 100Mb/s half duplex – maybe some of these switches now consider 100Mb/s half duplex as the minimum.  Even at 10Mb/s half duplex the speed should be enough to fully saturate the 1Mb/s WAN connection… until we consider latency and the ACK frequency.  If it takes an average of 0.9 seconds for the server to see the next request for the next two 1500 byte network packets, that just might slow down the connection a bit.  Someone might have the wise idea of configuring the server to send more than two packets, maybe 16, at a time before waiting for the client to send back a single ACK packet.  Sounds like a good idea, right?  In part three of this series I showed why that might not be such a good idea – especially if only one side of the connection is configured for the modified ACK frequency.  That could never happen in a production environment, right?

The OP provided a screen capture of Wireshark running on his server.  I have attempted to make the screen capture a bit more generic and have marked up the screen capture:

I provided the following comment:

“That screen capture (assuming that it was captured from the server) shows that the server sent 16 packets in rapid fire and then waited. 78ms later [indicated by the green boxes in the screen capture] (roughly the ping time that you reported based on the TRACERT output) your server received 8 ACK packets – one for every 2 packets sent by the server (the client side is not configured to ACK after every 8 or 16 packets as the server appears to be configured). You will see a somewhat similar screen capture here where I “optimized” the ACK frequency on one side of the connection with a large array fetch size.

If I am reading the Wireshark screen capture correctly, the 5 second delay [the blue boxes in the screen capture show the roughly 5 second delays] happened after the server received 8 ACK packets (ACKs for the previous 16 packets that the server sent), and before the server continued to send the next batch of packets to the client. The problem appears to be on the server side of the connection – the server is probably waiting for another packet or a timeout. The typical timeout permitted to wait for an expected packet is 0.2 seconds, much less than the 5 second delay. Is it possible that someone “optimized” some of the other network parameters on the Windows server?” 

This OP asked why it appears that the client sent back 8 ACK packets in rapid fire to the server, this is my response:

“It is an illusion that all of the ACK packets are being sent together – well, not really. The ACK packet is a little bit like a method of controlling the speed of a transfer: the sender promises not to send more than (by default) two network packets at a time before receiving confirmation that the two packets were safely received by the sender. If the sender does not receive an ACK packet typically within 0.2 seconds it assumes that the packets were lost in transmission and attempts to resend the packets. Your server is sending out 16 packets before pausing to wait for the return of the single ACK packet to acknowledge that the previous 16 packets were received. The client computer at the other end of the DSL line receives the first two packets 0.04 seconds after the server sent them and replies with an ACK for those two packets. It then immediately sees the next two packets and replies with an ACK for those two. It then immediately sees the next two packets are replies with an ACK for those two. This continues until the client receives the 15th and 16th packets (a Wireshark capture on the client should show this). The server sees the 8 ACK packets for every 2 packets sent coming in all at once (0.04 seconds after being sent by the client), when it is expecting a single ACK for the previous 16 packets. I assume that this is at least part of the cause for the 5 second pauses.”

The OP provided a screen capture of his client when it submitted the same SQL statement (likely from a different time period), and mentioned that there are a lot of checksum errors appearing on the client side:

This is my response from the thread:

“The header checksum errors are caused by TCP checksum offloading. You will typically receive that message when a better quality network card is installed in the computer – the packet checksum calculation/verification is handled by the CPU in the network card itself, rather than relying on the host computer’s CPU. I think that one of my blog articles mentions this feature. More information may be found here.

That is interesting that you periodically see 4 packets arriving with a single ACK packet sent in return – possibly some of the packets were received by the client out of order, so the client had to wait for the expected packet to be received. The typical pattern for a database sending packets seems to be one or more TCP packets with “[TCP segment of a ressembled PDU]” in the info column followed by a single TNS packet with “Response, Data (6), Data” in the info column – that TNS packet seems to marks the end of the array fetch (although I may need to perform additional analysis). Based on some of the tests that I have performed with varying the array fetch size, if you cannot track down why the ACK frequency is wrong, you may be able to work around the issue by setting the array fetch size much smaller.

In the above blog article, you can compare the “Optimized” transfer speed as the fetch array size is increased, and also compare it with the unoptimized (no one has changed the ACK frequency) transfer. You will see that as the fetch array size increases, the ACK frequency drops off – this is a setup where I modified the ACK frequency on the client, while I believe that the ACK frequency of your server was adjusted.”

The OP also stated that on the server-side of the network he tried a couple of network transfer performance tests.  He tried copying 640MB of data and saw estimated completion times of 140 to 250 minutes.  He then changed a setting on the server’s network interface and the time dropped to about 4 minutes.  In the forum thread I mentioned the following:

“Something is wrong with the above. On a 100Mb/s connection, I am able to copy a 129MB file to a laptop in about 18 seconds. That suggests a 10Mb/s connection (which you will see when the server and switch are not set to the same speed – if not auto-negotiate) would be able to transfer roughly 12.9MB in the same 18 seconds. Doing the math:

129MB in 18 seconds at 100Mb/s, 12.9MB in 18 seconds at 10Mb/s
640MB/12.9 * 18 = 893 seconds = 14.9 minutes to transfer 640MB over a 10Mb/s connection.
(or 12.9MB/18 = 0.7166MB/s, 640MB / 0.7166MB ~ 893 seconds)

Or another way, assuming that the real-world speed of a 10Mb/s is close to 7Mb/s:

640MB / 7mb/s * 8bits = 731.43 seconds = 12.2 minutes to transfer 640MB over a 10Mb/s connection.

You should not be seeing 140 or 250 minutes (unless your server is not sending ACK correctly, in which case Wireshark would show periodic 0.2 second delays) – it should be close to 15 minutes.”

—-

As I mentioned, this OTN thread is interesting, providing us an opportunity to use some of the tools mentioned in earlier blog articles to try to find a resolution to a particularily troubling problem found in the wild.  While the complete cause of the 5 second delay problem has need yet been discovered, it should be an interesting thread to follow.  My vote on the cause of the problem is a combination of delayed ACKs due to a configuration change on the server (problem not yet found in the Windows registry), out of order receipt of packets, the server becoming confused at receiving 8 ACK packets when it expected 1, and a couple of network hardware configuration issues (maybe even the use of jumbo packets that are being fragmented) such as having to conform to the 1476 byte MTU of the DSL connection when the network local to the server is configured for a larger MTU size.