Re: [PATCH] x86, UV: Fix NMI handler for UV platforms

From: Don Zickus
Date: Tue Mar 22 2011 - 18:05:41 EST

On Tue, Mar 22, 2011 at 04:25:19PM -0500, Jack Steiner wrote:
> > > AFAICT, the UV nmi handler is not consuming extra NMI interrupts. I can't
> > > rule out that I'm missing something but I don't see it.
> >
> > What happens if you put the UV nmi handler below the hw_perf handler in
> > priority? I assume the DIE_NMIUNKNOWN snippet in the hw_perf handler will
> > swallow some of the UV NMIs, but more importantly does it still generate
> > the hang you see?
> I verified that the failures ("perf top" stops) are the same on both RHEL6.1 & the
> latest x86 2.6.38+ tree.

Thanks for testing that.

> I switched priorities & as expected, "perf top" no longer hangs. I see an occassional
> missed UV NMI - about 1 every minute. I also see a few "dazed" messages as
> well - 3 in a 5 minute period. This testing was done on a 2.6.38+ kernel.
> I'm running on a 48p system.
> Ideas?

Wow, interesting.

The first thing is in 'uv_handle_nmi' can you change that from
DIE_NMIUNKNOWN back to DIE_NMI. Originally I set it to DIE_NMIUNKNOWN
because I didn't think you guys had the ability to determine if your BMC
generated the NMI or not. Recently George B. said you guys add a register
bit to determine this, so I am wondering if by promoting this would fix
the missed UV NMI. I am speculating this is being swallowed by the
hw_perf DIE_NMIUNKNOWN exception path.

Second the "dazed" messages are being seen on other machines (currently
core2quads) when using perf with lots of NMI events. So you might be
seeing a second more common issue there. I still need to find time to
debug that.

Finally, I am trying to scratch my head about the 'perf top' no longer
hangs part. The only thing I can think of is under high perf load (with
out extra NMIs by your BMC), we have seen extra NMIs get generated while
processing the current NMI (mainly because Nehalems have I think 4 or 8
PMUs that can be activate at once, so multiple NMIs can trigger here).
But we can recover from this because we check _all_ the PMIs during the
NMI (which currently always comes from the PMU).

Now this extra NMI from the PMU can also happen on a singlely activated
PMU because we reload the PMU, then check the events to see if we should
disable it. By the time we finish checking (and determine we are not done
yet), the event could have rolled over and generated another NMI before we
have finished processing the current one.

So throw in an external NMI into the above situation (which gets dropped
as the third NMI I believe if I read the history of these NMI things
correctly), then it is possible that if uv_handle_nmi is called first it
could swallow the extra NMI as its own and leave the hw_perf hanging.
(that's a mouthful, huh?)

Then again with the priorities switched I guess the opposite is true too,
that your BMC is left missing an event.

This sort of supports the need for your patch earlier or something similar
which says ignore the handler's return code and process all the events on
the die_chain anyway. And if noone has handled the NMI, then trigger an
unknown NMI.

Unless there is a way to determine if an NMI is latched or not before
issuing the iret and if so assumed we dropped an NMI and process everyone.

I'll need to think of a way to prove all this in the morning (or maybe

I hope that makes some sense as it is late and my brain is shutting down.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at