Re: pre-2.0.31 & network stalls

Eric.Schenk@dna.lth.se
Fri, 06 Jun 1997 00:10:58 +0200


Manfred Petz <pm@radawana.cg.tuwien.ac.at> writes:
>
>The 2.0.30 kernel used to lock up on TCP connections from the
>Linux side to the SVR4.2. _Very_ often. When I changed to another
>VC and made a ping(1) to the SVR4.2 box, the connection awakened
>again. And making a TCP connection (TELNET) from the SVR4.2 box
>to the Linux machine had also no problems.
>
>Under pre-2.0.31 it seemed that this problem has gone. Until now.

I need some clarification here. Was the problem you observed
between the SVR4.2 box and the linux box over an ethernet wire,
or over a PPP connection? If over an ethernet wire, what kind
of hardware is involved.

>Connections to our PPP clients lock up from time to time. It seems
>that connections to the SVR4.2 have no problems. I can't say that for
>sure, however. It seems that connections tend to lock-up when the
>PPP client tries to transfer a large amount of data (a big file
>with FTP while making other short-living connections, like
>playing around with www. The lockups happen on any PPP modem
>line and also with only one client connected.

Lookups on the PPP link _sound_ like a broken modem, or a broken
VJ compression algorithm. Questions:

(1) You're not using US Robotics Sportsers, are you?

(2) Try turning off VJ compression. You might be surprised.
(Broken implementations include such popular systems as
various releases of Shiva LanRover, Annex, and others....)

>Because the TCP stalls under 2.0.30 and pre-2.0.31 happen
>between both the SVR4.2 _and_ the PPP clients, it can't be a
>network hardware related problem.

Don't bet on it. :) There are more hardware problems out there
than you might imagine. Almost every report I get about TCP freezing
ends up coming down to broken hardware, or configuration problems.
(Discounting reports about TCP freezing due to 20-50% packet loss
rates on the intenet at large. Heck, that's no surprise. The
surprise is that you manage to start a conversation when loss
rates are that high.)

>o Could it be a problem with ARP and/or proxy ARP? I wonder about
> the 'ping' effect?

This is a strong possibility.

>o If I can help tracing this down, tell me (how).

Yes, you can help. The main things that I would need are
"tcpdump -n -S -tt" dumps on the interface(s) showing the problem.
Try to capture entire conversations. More than one if possible.
Capturing the "ping" effect would be especially interesting.
A full run down of the link hardware involved is also important.
Both sides of the link in the case of the PPP link.

Seriously, I have no outstanding reports of performance problems with
TCP that can be linked to a problem in TCP. If you have found such
a problem I need to know about it in as much detail as possible in
order to fix it. On the other hand, please make every effort to be
sure that it really is a TCP problem. There are only so many of us
working on the network code, and we don't have time to debug everyone's
configuration/hardware problems.

Cheers,

-- 
Eric Schenk                               www: http://www.dna.lth.se/~erics
Dept. of Comp. Sci., Lund University          email: Eric.Schenk@dna.lth.se
Box 118, S-221 00 LUND, Sweden   fax: +46-46 13 10 21  ph: +46-46 222 96 38