Re: Network hang with 2.4.1-pre9 and 3c59x

From: Andrew Morton (andrewm@uow.edu.au)
Date: Wed Jan 24 2001 - 19:24:49 EST


"Maciej W. Rozycki" wrote:
>
> On Wed, 24 Jan 2001, Andrew Morton wrote:
>
> > This is due to a lost APIC interrupt acknowledgement. A workaround
> > is to boot with the `noapic' LILO option.
> >
> > This long-standing and very nasty problem was discussed extensively
> > a week or two ago. Suspicions were cast at the disable_irq() function
> > but I'm not sure anything 100% conclusive was arrived at.
>
> Not sure if that is 100% conclusive but I decided to develop an APIC
> lockup recovery procedure. Fortunately chips provide us enough
> information we may deal with the problem with moderate pain.

Cool.

> > I guess I'll have to find a way to make disable_irq() go away,
> > see if that helps.
>
> Please don't. This would be hiding problems under a carpet.

Whether it's fixed properly, or kludged in the APIC code or kludged
in the drivers, it needs to be fixed. I've spent nine months
methodically picking away at the 3com driver so it's now very
reliable, and this interrupt problem is the major failure mode.
In fact, the only failure mode, apart from the usual dodgy
ethernet switch negotiation blah.

So I've started to poke at this problem as well. I'd be glad to stop :)

Attached are two patches:

irq-whacker.patch:

This is a patch against the 3com driver which simply calls
disable_irq()/enable_irq() at 100kHz. Enable it with the
`whacker=1' module parm. With this thread running, the
APIC dies within about one second as soon as you start
sending 100baseT traffic through the interface. So it's
nice and reproducible. This testing setup should translate
easily into any PCI netdriver.

manfred.patch:

Manfred's edge+level trigger hack. This fixes the problem!
It slows down disable_irq()/enable_irq() a bit, but that
doesn't seem an issue. A proper fix would be nice, but
this puppy works.

Manfred's ALT+SYSRQ+Q trick also fixes the problem.

Enabling processor focus simply makes interrupts
stop altogether. Haven't looked into this yet.

-



This archive was generated by hypermail 2b29 : Wed Jan 31 2001 - 21:00:20 EST