Re: 2.0.27 major problems #1 -- 3c59x driver.

Richard B. Johnson (root@analogic.com)
Tue, 11 Feb 1997 20:34:44 -0500 (EST)


On Tue, 11 Feb 1997, Paul T Danset wrote:

>
>
> On Tue, 11 Feb 1997, Chris Evans wrote:
>
> >
> > This bug has been plaguing Linux long enough; if you have a test case
> > where you can reproduce the repeated "access conflict" scenario, then by
> > all means get into contact with the author who might be able to send some
> > debugging code to help him sort it out.
[SNIPPED]

>
> o The only people that seem to be having the "Transmitter access conflict"
> + network hang problem are those on busy network with drivers that are
> very similar in structure to 3c59x.c. For example, I've never seen
> people complaining of similar problems with the smc-ultra.c driver ...
> even though it was also written in large part by the author of 3c59x.c.
> smc-ultra.c looks quite different from 3c59x.c, perhaps because Donald
> had to work around an interrupt related bug by using busy-waiting. If
> the problem was with another part of the network layer, then we should be
> seeing errors from many other boards.
>
> o I also don't think it is a hardware problem, since the same machine
> converted to NT 4.0 has never experienced a hang. (There seems to
> be more and more of these sprouting around here, like pods in the
> "Invasion of the Body Snatchers" :)
[SNIPPED]
I don't have one of those cards to try, but look at the source in the area
you mention. Just above the "access conflict ", error message ( I am at
home on a crappy terminal so I can't cut-paste into this mailer). There
is a few lines of code that drop the packets and reinitialize the controller.

Just cut and paste into the same area when the "access conflict" occurs.
The idea is that when the controller refuses to work, reinitialize it.

Note that a few packets will be dropped. So what? thousands of packets are
dropped in the physical wire!

When writing an Ethernet driver, the idea is to get it back on-line as
quickly as possible without even trying to sort out the problem. The
higher-speed the card, the more often they lock up. One should only detect
that the lockup occurred and then reinitialize it. Don't worry about
dropped packets (seriously), it takes too much time. You might want to
keep track of the number of TX failures for "/proc", but don't bother
writing printk() unless you are debugging.

Note that if your network has NetBouis (or however you spell J_u_n_k),
every DATA packet to those machines is a BROADCAST packet that your
controller has to receive, then pass to the kernel so it can throw it
away. I hacked up a NE* driver, removed most of the code! This improved
my network throughput at least two times. Note that the kernel doesn't
have to know that a packet got lost. The IP layer will re-send or request
missing packets. I made my hacked driver faster by pretending that it
always sent everything fine. However, internally I have to make certain
that the next packet(s) will be handled okay. I just keep the fixups
secret.

Cheers,
Dick Johnson
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Richard B. Johnson
Project Engineer
Analogic Corporation
Voice : (508) 977-3000 ext. 3754
Fax : (508) 532-6097
Modem : (508) 977-6870
Ftp : ftp@boneserver.analogic.com
Email : rjohnson@analogic.com, johnson@analogic.com
Penguin : Linux version 2.1.26 on an i586 machine (66.15 BogoMips).
Warning : It's hard to remain at the trailing edge of technology.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-