Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered

From: Ian Kumlien
Date: Sun Dec 07 2003 - 16:33:00 EST

On Sun, 2003-12-07 at 21:59, Jesse Allen wrote:
> On Sun, Dec 07, 2003 at 08:58:47PM +0100, Ian Kumlien wrote:
> > > There are three parts to this email.
> > > a) apic mods.
> > > Lockups are due to too fast an apic acknowledge of apic timer int.
> > > Apic hard locked up the system - no nmi debug available.
> > > Fixed it by introducing a delay of at least 500ns into
> > > smp_apic_timer_interrupt() just prior to ack_APIC_irq().
> > > b) io-apic mods
> > > So I have fixed it too (tested on both my epox and albatron MOBOs).
> > > Firstly I found 8254 connected directly to pin 0 not pin 2 of io-apic.
> > > I have modified check_timer() in io_apic.c to trial connect pin and
> > > test for it after the existing test for connection to io-apic.
> So do you think Ross has found the connection between all three
> issues? (Timer and NMI watchdog, IRQ 7, and CPU disconnect?)

Since disabling CPU disconnect or adding this delay both seems to fix
the deadlock, i see the rest as cool stuff.

> I suppose we should now try these changes other the last two. From
> what he's saying, this will fix the lockup too. Hopefully the patch
> will be refined =). I probably won't be able to run this till
> tomorrow, and after I get it diffed for 2.6. I've barely got the cpu
> disconnect patch going today, but haven't tested it because I can only
> access my nforce2 machine remotely =). But from everything I'm
> hearing, the lockups are gone.

Yes, at least for me, and my machine really liked to deadlock.

I'm wondering however, could the ACK be moved down in the code? after
some handling? (i have no clue of how the code works, and even less
about how the hw part works, fyi)
But i thought that if it's moved then that might be a generic way to fix
it without adding a blocking delay.

Ian Kumlien <pomac () vapor ! com> --

Attachment: signature.asc
Description: This is a digitally signed message part