Re: [PATCH 4/6] x86/irq: Process nmi sources in NMI handler

From: Jacob Pan
Date: Thu May 30 2024 - 12:05:40 EST


Hi Peter,

On Wed, 29 May 2024 13:47:09 -0700, "H. Peter Anvin" <hpa@xxxxxxxxx> wrote:

> On 5/29/24 13:33, Jacob Pan wrote:
> > +
> > + /*
> > + * Per NMI source specification, there is no guarantee that a
> > valid
> > + * NMI vector is always delivered, even when the source
> > specified
> > + * one. It is software's responsibility to check all available
> > NMI
> > + * sources when bit 0 is set in the NMI source bitmap. i.e. we
> > have
> > + * to call every handler as if we have no NMI source.
> > + * On the other hand, if we do get non-zero vectors, we know
> > exactly
> > + * what the sources are. So we only call the handlers with the
> > bit set.
> > + */
> > + if (source_bitmask & BIT(NMI_SOURCE_VEC_UNKNOWN)) {
> > + pr_warn_ratelimited("NMI received with unknown
> > source\n");
> > + return 0;
> > + }
> > +
>
> Note: if bit 0 is set, you can process any other bits first (on the
> general assumption that if you bother with NMI source then those events
> are performance sensitive), and you could even exclude them from the
> poll. This is an optimization, and what you have here is correct from a
> functional point of view.
>
Yes, it is a good optimization but also a rare case that bit 0 is set. no?

> > + source_bitmask = fred_event_data(regs);
> > + if (!source_bitmask) {
> > + pr_warn_ratelimited("NMI received without source
> > information!\n");
> > + return 0;
> > + }
>
> If the event data word is 0, it probably should be treated as a
> *permanent* failure, as it is a Should Not Happen[TM] situation, and
> means there is an implementation (or, perhaps more likely,
> virtualization!) bug, and as such it may not be safe to trust the NMI
> source information in the future.
>
Good point, I will add a flag to permanently disable NMI source reporting
for this boot cycle if that happens.

> > + if (!cpu_feature_enabled(X86_FEATURE_NMI_SOURCE) || type !=
> > NMI_LOCAL)
> > + return 0;
>
> I'm not sure I understand why you are requiring type to be NMI_LOCAL here?
>
It is just for this current implementation I am not including external
NMIs. AFAIK, there is no users, i.e. no device MSIs delivered as NMI. I saw
effort trying to make HPET NMI watchdog but not materialized.

Thanks,

Jacob