The delay seems about normal, see below.
>Incidentally I get similar but not as severe behaviour from a Linux
>1.3.85 box (830cps ftp, very clunky text, but only occasionally freezing
>for >1s like solaris) and an OSF/1 v3.2 box (917cps ftp, clunky text,
>never freezes). I haven't done tcpdumps for these, but I will if people
>really need it. Actually I'm not sure I have a working tcpdump for OSF;
>my memory is it simply misses a lot of packets for no adequate reason.
>Oh well.
It seems to me that SUNOS is choosing to retransmit packets much
to early. Say a 1460 byte packet gets injected into the network by
staff at time T, which then makes its way to your machine.
Just counding transmission time over the modem I'd expect there to
be a 1 second delay between the time staff sends the packet and the
linux box can see it and ACK it. If the linux box waits another 1/2
second to send the ACK (which is about what we are doing now),
then sends it over the modem, you'd expected pretty much what
you are seeing for the round trip delay on the ACK.
Looking at staff's output log, it starts the first resend 1/2 second
after the first packet is sent. There is no way that fuzzbox can
even have the packet yet, never mind have responded.
Now, this is all complicated by things getting into what I've
been calling a "fast retransmit" war. We see a stream of duplicate
packets on the linux side, ACK them all with the same ACK #, and
next thing we know Solaris things it needs to resend these packets.
Now we see a stream of duplicate packets on the linux side, and repeat.
The war is self damping, but it takes quite a while to do it.
If we could avoid starting it in the first place we wouldn't have
nearly as much trouble. However, I haven't been able to think of
any nice ways to avoid this problem that conform with the RFCs.
[One possibility that I want to to try soon is to ACK duplicates
of packets that we have already recieved with a sequence of
ACK's that have increasing windows (one byte each time).
This should defeat the unneeded fast retransmit on the solaris end,
however it is not clear that this is a good idea in general, or
very easy to implement.]
It is possible that shortening the delayed ACK timeouts will alleviate
this problem. I'm testing this possibility currently.
In any case, I've found that it damps out after about 1/2 meg of transfers
with a 1500 MTU/MRU setting on the PPP link. You can get it to damp out
much faster by choosing an MTU/MRU of 296. With this setting I
currently seeing transfer rates of about 1.5Kbps both to and
from a Solaris box, however this is with a patched 1.3.88 kernel
that has some fixes for fast retransmit that haven't been sent out
to Linus yet. I'm hoping to get these patches out the door today yet.
As an aside, looking at linux to linux transfers I am seeing a
similar problem. That is, we start out resending much to fast.
I'm looking for it now, but I have not yet tracked it down.
>With older kernels this does get 1.5+Kb/s, so don't blame the NetBSD box :-)
Hmm. I'd like to see what the timing looks like with these older kernels.
(Personally I get lousy performance against SUNOS machines on every
linux kernel).
Is it possible for you to boot up one of these older kernels and send
me the packet dumps?
-- eric
---------------------------------------------------------------------------
Eric Schenk www: http://www.cs.toronto.edu/~schenk
Department of Computer Science email: schenk@cs.toronto.edu
University of Toronto