Re: [PATCH] i386: Fix the K7 NMI watchdog checkbit

From: Björn Steinbrink
Date: Fri Jun 08 2007 - 22:34:13 EST


On 2007.06.09 04:27:10 +0200, Björn Steinbrink wrote:
> On 2007.06.08 22:43:25 +0200, Ingo Molnar wrote:
> >
> > * Björn Steinbrink <B.Steinbrink@xxxxxx> wrote:
> >
> > > Anyway, both are bugs and should be fixed. Maybe we're even lucky and
> > > it fixes your hang. *fingers crossed*
> >
> > just to make it clear: the NMI watchdog was working perfectly fine on
> > that box (in v2.6.21 and in dozens of kernel releases before that, for
> > multiple years) before Andi's cleanup patch. So lets find that bug first
> > or revert the cleanups.
>
> Might have been pure luck. ;-) The culprit seems to be commit
> b7471c6da94d30d3deadc55986cc38d1ff57f9ca (from Sep 2006), which
> introduced the check bit to figure out if a NMI was generated by the
> watchdog timer. While the performance counter register on K7 is 64 bits
> wide, the upper 16 bits are reserved and thus using bit 63 as the check
> bit is wrong. A quick check using /dev/cpu/0/msr shows that
> here, the upper 16 bits are zero all the time, chances are that this is
> not deterministic and you got a 1 in bit 63 due to some random change.

Hrmpf... Should've read the AMD docs first, not some random website. The
upper bits are "read as zero", so while that was another bug fix, it's
unlikely to help in your case. :-(

Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/