2.2 TCP stall : still there but I've clues now

From: Willy Tarreau (willy@novworld.Novecom.Fr)
Date: Sat Mar 18 2000 - 12:08:52 EST


Hi all

I've just tested 2.2.15pre14aa2 and the TCP stall I had experienced since
at least 2.2.14 is still there. But now I've nearly find how it arises.

I can now reproduce it with a Sun Ultra 5/Solaris 2.6, and not only the
AlphaServer. The problem comes on every transfert performed at 100 Mbps
from the client to the server, more precisely each time my linux box receives
a 100 Mbps data stream (at least say a high sustained rate).

I've caught many traces with tcpdump from my linux box, and each one report
that the kernel doesn't acknowledge some packets. Even on a 30 kB file, I get
a loss. I wondered if this could be due to a bad checksum, but I don't know
how I could have indication of bad IP and/or TCP checksum ...

I thought about a cable problem, but I've tried several ones and the problem's
still there.

For a last test, I had the idea to slow down the traffic by inserting 1000
ipchains rules in the input chain. And now, I can get a full rate transfer
without any loss !

So it seems that when the kernel receives too many packets in a small amount
of time, it occasinnaly looses one. Perhaps is it due to a buffer full, or
one packet being received while another one being processed ... don't know.

Now that I know I don't need the alphaserver to do some more tests, I can
make some every evening, but I must say I don't have any more ideas :-(

If someone could suggest me things to patch, modify or simply look at, I'll
do. At least, how to detect bad checksums.

I'll soon put all the logs online at http://www-miaif.lip6.fr/willy/tcplogs/

The Sun is at 10.100.11.4 and the notebook at 10.100.11.3.

My notebook is a K6-2/450+64 MB, with many kernels from 2.2.14 to 2.2.15pre14aa2
including many monster patches I often make from Alan's ones. But now I know
the problem doesn't come from my patches.

Ah, important : my network card is a cardbus Xircom CB2-10/100. I'd like to
test with another type, but I don't have now.

Well, I hope this can help in finding the bug - if it's a bug -.

have all a nice week-end,

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Mar 23 2000 - 21:00:24 EST