Re: 2.0.27 major problems #1 -- 3c59x driver.

Richard B. Johnson (root@analogic.com)
Thu, 13 Feb 1997 08:21:07 -0500 (EST)


On Thu, 13 Feb 1997, Cameron MacKinnon wrote:

> > From: Chris Evans <chris@ferret.lmh.ox.ac.uk>
> > On Wed, 12 Feb 1997, Philip Blundell wrote:
> > > A transmitter access conflict is not disaster. There is no need to
> > > reinitialise the controller - all it means is that the driver's
> > > transmit routine was reentered, and the second transmit was deferred to
> > > avoid contention.
> >
> > I am forced to disagree -- when your card hangs it certainly _is_ a
> > disaster. Additionally, the code implies that that if execution reaches
> > this stage it is a disaster anyway; quote "if this ever happens then the
> > queue layer is doing something evil"
>
> NOT being an expert in the Linux networking code, a few disinterested
> observations:
> - Maybe the evil IS in the queue layer, and others haven't noticed as
> their ethernet performance isn't as stellar as yours. Do the errors
> occur randomly, or only under high load?
[SNIPPED]

The "doing something evil" is quite often that the queue layer tries to
transmit another packet before the last one was transmitted. When you
tell the usual Ethernet chip to transmit a packet, the modern chips, i.e.,
NOT the 8390, only "promise" to get it transmitted. Older chips would
not return "good" status until the packet was actually transmitted. The
chips do automatic retries to handle collisions. The more modern chips
return "good" status as soon as the packet is ready in its buffer. This
allows the driver CPU to do something else while the packet is actually
being sent.

However, most hardware hangs occasionally. The least overhead fix of
all the Ethernet chips I've programmed is to reset the thing and start over.
Even the chips that require being set into loop-back mode during
initialization, take less than a typical packet transmission time to
reprogram. The result is a lost packet which will be re-sent when requested.

Under high load, I often get the "couldn't allocate a sk_buff", and
"memory squeeze, dropping packet", errors. They really should be displayed
only when debugging is turned on. The correct operation to perform when
resources are not available it to drop the packet (which is what many
drivers do). However, some drivers attempt herorics including busy-waiting.
This is most often counter-productive.

I have some tools that record TCP/IP transmission/roundtrip speeds. If
anyone is interested, contact me via private email.

Cheers,
Dick Johnson
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Richard B. Johnson
Project Engineer
Analogic Corporation
Voice : (508) 977-3000 ext. 3754
Fax : (508) 532-6097
Modem : (508) 977-6870
Ftp : ftp@boneserver.analogic.com
Email : rjohnson@analogic.com, johnson@analogic.com
Penguin : Linux version 2.1.26 on an i586 machine (66.15 BogoMips).
Warning : It's hard to remain at the trailing edge of technology.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-