Re: Lockups - lost interrupt

Wade Hampton (whampton@staffnet.com)
Mon, 13 Sep 1999 14:12:18 -0400


Wade Hampton wrote:
>
> Mike Black wrote:
> >
> > I'm installing the ikd patch on 2.2.12 with Mingo's 2.2.12 raid patch. With
> > some luck, it shouldn't take long to find the problem. I've got one script
> > that takes about 10 minutes to lockup the machine (locks up in either UP or
> > SMP mode).
> Dumb question -- where can I get the ikd patch? I could put it on the
> Dell (the most repeatable crash) and see if I could find the problem.

Results so far on the Dell WS400 (dual PII/300):

1. stock kernel 2.2.12 with kdb 0.5 patch

2. installed ikd patch
a) had to manually fix the Makefile and arch/i386/config.in (.rej
files)
b) setup for serial console
c) setup for: Detect software lockups
Print %eip,
SMP-IOAPIC NMI SW watchdog,
IRQ 0

Serial console worked fine. Control-A allowed me to break on the
serial console. All seemed fine. Started X, sound, x11amp on the
soundblaster, a play loop on the crystal. Started mformat a:,
dd if=junk of=/dev/fd0 loop on the floppy. After about 5 minutes
the system did not hang so I started a NFS tar from another machine
of about 1GB of MP3 files. After about 5 minutes of this abuse,
the system hung. No response on the serial console, no OOPS's,
nothing.

As my display was on X, the "Print %eip" did not help....

During the dd'ing, I was getting:

floppy0: unexpected interrupt
floppy0: sensi repl[0]=80

Trial #2, same as above, but no floppy activity. Did a make xconfig
and it hung....

I am on trial #3.... The kernel is being built to NMI on IRQ 3
(ttyS1)...

> One of the Penguins (dual PIII) is running under load over the weekend
> on 2.2.11 (yes, I went backwards, but I did not have any crashes with
> 2.2.11). I'll let you know how well it does Monday AM.
This machine is still up and running fine. It was the machine doing the
massive NFS copy from the Dell and it is functioning without any errors.
I am continueing to load this machine on 2.2.11 and will keep you posted
if it crashes.

> >
> > There
> > are about three different cases here
> >
> > 1. VIA chipset bug - known, understood, non SMP
> > 2. A few triton boards - probably a hardware issue
> > 3. SMP - looks like a lock bug.
> >
> > Running the ikd patch is the best help here. I think it will show you a
> > spinlock deadlock. The trace from that should find the guilty party
If I ever get a trace.... Should I move the NMI to another IRQ, for
example
the one from the serial console? This is an older PII motherboard!

Cheers,

-- 
W. Wade, Hampton  <whampton@staffnet.com>

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/