TCP latency problem with newer kernels 2.4.18 etc (and not 2.4.9)

From: Eric Cano (Eric.Cano@cern.ch)
Date: Fri Aug 15 2003 - 12:22:30 EST



Hello,

We have a general TCP problem and we think that you are the right list
to ask about it. If this is not correct, please point us to the person
who might be able to provide help.

In short, the problem is a very large latency for sending messages over
TCP sockets when the message size is between about 500 and 1500 Bytes.

The test consists of the classical round trip (aka ping-pong) measurement
where one node sends a message to a mirror node which receives the message
and sends it back to the originator node. This ping pong test is repeated
in a loop and the total round trip time is measured. The OWL (one-way
latency) is half the total round trip time and hence corresponds to the
latency of a send and receive operation.

We have observed the problem on various hardware with Gigabit Ethernet
NICs, in particular Pentium III PC with Alteon AceNIC as well as dual Xeon
Pentium IV with Intel 1000 Pro NICs. Standard socket settings were used.

We will present the problem by comparing the test results obtained with
two kernels. One obtained for kernel 2.4.9 and the other with 2.4.18. (For
your information, kernel 2.4.20 gives the same result as 2.4.18. We have not
tried kernels 2.4.x in between 9 and 18, so cannot tell exactly with which
minor version the change in behaviour occurred.)
To be more precise the kernel 2.4.18 is actaully 2.4.18-27.7.x.cernsmp,
which is a derivative of 2.4.18-27.7 (RedHat), itself beeing a derivative
of the 'standard' 2.4.18. So, the kernel we're using contains O(100)
patches from RedHat, plus ~20 patches from CERN. The file with the patches
beeing applied in the build process is attached.

Attached are two plots of the measured OWL (on Pentium III PC with Alteon
AceNIC). The x-axis is message size in bytes and the y-axis is OWL in
micro-seconds (us).
For kernel 2.4.9 the result looks reasonable. To first order the OWL can
be described as a linear functon with an offset of about 160 us and slope
corresponding to 88 MB/s. There is however a curious dip between about
500 and 1500 bytes.
On the other hand, the kernel 2.4.18 result is erratic for sizes between
about 500-1500 bytes. It looks like they are only sent after a long timeout
of about 20'000 us (= 20 millisecs) has expired.

The question is whether there is a fix for this problem in more recent
kernel versions, or maybe there is a paremeter to set ?

Best Regards, Frans Meijers and collegues.

Attachment: build-2.4.18-27.7.x.cern.cern.log
Description: Binary data

Attachment: OWL2418.gif
Description: GIF image

Attachment: OWL249.gif
Description: GIF image