Re: [BUG] Transmit-timed-outs with 2.4.0test 3c59x drivers

From: Aaron Lehmann (aaronl@vitelus.com)
Date: Sat Aug 26 2000 - 01:21:00 EST


On Sat, Aug 26, 2000 at 03:57:35PM +1000, Andrew Morton wrote:
> Aaron Lehmann wrote:
> >
> > I've been really enthusiastic about playing with 2.4.0-test*, but one
> > problem that I've been experiencing has been making it unusable. After
> > a few hours of uptime with normal workstation network activity (NFS,
> > FTP, web), my 3c905 network interface goes down completely. When I
> > switch to a virtual console, I get messages saying "NETDEV WATCHDOG:
> > Transmit timed out".
> > ...
>
> Are you running on a busy, 10 Mbps hubbed LAN?

Your guess is correct. Actually the 10 Mbps part is temporary, as my
100Mbps hub doesnt seem to be working reliably. I wouldn't call it
busy, although for 10Mbps it might be. There are only three nodes on
the network and only two of them actually exchange significant amounts
of data.

> The driver has a problem recovering from 16 successive
> collisions. Fixed in 2.2 but still in private testing for
> 2.4 - David Hinds found some peculiarities in the Cardbus
> NIC's handling of this situation which complicated things.
>
> It was quite an elusive problem until David showed that
> running `ping -f -s 50000 victim' from a high-performance
> NIC on another machine produces a shower of errors. This
> test produces so many collisions that the driver actually
> gets _legitimate_ tx timeouts: it can't send any packets
> at all in a 400 millisec time period.

I am glad I am the network admin here. If I wasn't, I would probably
be in big trouble. Pingflooding my workstation from the network server
is usually not a good idea. The mistake I made was initiating the ping
over SSH from the workstation (with the buggy kernel), so as soon as
packets started arriving the machine dropped off the net with Tx
errors as you predicted and I had no way to stop the pings from the
server. My stupid mistake was rebooting into 2.4.0-test7 instead of
one with a reliable NIC driver :). As soon as the interface was
brought up the pings killed it. I booted a 2.2 kernel, sshed in and
killed the ping process.

> Please try this patch against 2.4.0-test7 and let me know
> how it goes.

It seems to work!!! When trying to ping my machine madly as you
described above, I get "Transmit timed out" errors every 10-20 seconds
but they appear to be legitimate -- the network still operates and the
machine nothing seems to be affected.

Thanks so much for helping me out with this. I hope you can help
countless other people too by getting the right people to include a
fix in 2.4.

Aaron Lehmann
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Aug 31 2000 - 21:00:17 EST