2.6.9-42.0.2.ELlargesmp #1 SMP x86_64 : network stack hang

From: Satish Chandra Kilaru
Date: Thu Apr 09 2009 - 16:04:18 EST


Hi
I am facing a strange issue. My linux machine (2.6.9-42.0.2.ELlargesmp
#1 SMP Thu Aug 17 18:16:22 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux) is
talking to a windows 2003 machine over LAN. Linux initiated 4 or 5 TCP
connections with the same Win 2k3 server. Windows 2003 is transferring
nearly 29GB of data to Linux. LInux reads it verifies checksum for
every 256K block and then writes to disk. Basically data transfer is
split across multiple streams in order to be able to perform faster
transfers.
After reading 14.5 GB of data, linux is sending zero window message to windows.
1) At this point windows sends a window probe with 1 byte of data.
2) Linux acknowledges this 1 byte but sets window size to 0 in the
acknowldgement
3) Win 2k3 sends another window probe with 1 byte.
4)  Linux acknowledges this 1 byte but sets window size to 0 in the
acknowldgement
5) Win 2k3 sends another window probe with 1 byte.
6)  Linux acknowledges this 1 byte but sets window size to 0 in the
acknowldgement
7) Win 2k3 sends another window probe with 1 byte.
8)  No response from Linux
9) WIn2k3 sends 4 such window probes with exponential delay. But no
response from Linux.
10) Finally windows gives up and indicates ConnReset to its application.
11) 15-20 seconds after windows sent final window probe, Linux sends
window update with 2944 bytes.
12) windows sends RST in response to this window update.
13) After this my application on windows and Linux start closing
connections and RSTs are exchanged. (not FIN).

Is this normal? Why is LInux not responding to window probe messages?
Are there any known TCp/Ip stack bugs in Red Hat Enterprise Linux AS
release 4 (Nahant Update 4)

CPU did not reach 100%. In fact it was around at 84% (idle) during the
test.top shows the following.
Cpu(s):  4.0% us,  3.4% sy,  0.0% ni, 84.4% id,  7.9% wa,  0.1% hi,  0.2% si
There is not much change in Memory statistics, shared memory or disk usage.

Any hints are welcome.  Upgrading to a newer OS version can only be
done as a last choice.
What additional info do you need in order to solve this? Network
traces are collected on a 3rd machine by using a smart switch.

OS Version:
-----------------
oracle@rcortest01> uname -a
Linux rcortest01.riverside.tld 2.6.9-42.0.2.ELlargesmp #1 SMP Thu Aug
17 18:16:22 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux

oracle@rcortest01> cat /etc/redhat-release
Red Hat Enterprise Linux AS release 4 (Nahant Update 4)

Thanks in advance.
-Satish
--
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html