Re: [PATCH v3 05/11] x86/irq: Process nmi sources in NMI handler

From: Xin Li
Date: Fri Jun 28 2024 - 23:40:15 EST


On 6/28/2024 1:18 PM, Jacob Pan wrote:
With NMI source reporting enabled, NMI handler can prioritize the
handling of sources reported explicitly. If the source is unknown, then
resume the existing processing flow. i.e. invoke all NMI handlers.

Signed-off-by: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx>

The code looks good to me, however please improve coding styles and
comments, see below.


---
v3:
- Use a static flag to disable NMIs in case of HW failure
- Optimize the case when unknown NMIs are mixed with known NMIs(HPA)
v2:
- Disable NMI source reporting once garbage data is given in FRED
return stack. (HPA)
---
arch/x86/kernel/nmi.c | 73 +++++++++++++++++++++++++++++++++++++++++--
1 file changed, 70 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 639a34e78bc9..c3a10af7f26b 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -149,23 +149,90 @@ static inline int do_handle_nmi(struct nmiaction *a, struct pt_regs *regs, unsig
return thishandled;
}
+static int nmi_handle_src(unsigned int type, struct pt_regs *regs, unsigned long *handled_mask)
+{
+ static bool nmi_source_disabled;
+ bool has_unknown_src = false;
+ unsigned long source_bitmask;
+ struct nmiaction *a;
+ int handled = 0;
+ int vec = 1;
+
+ if (!cpu_feature_enabled(X86_FEATURE_NMI_SOURCE) ||
+ type != NMI_LOCAL || nmi_source_disabled)

Harder to read, no need to break into 2 lines.

+ return 0;
+
+ source_bitmask = fred_event_data(regs);
+ if (!source_bitmask) {

unlikely()?

+ pr_warn("NMI received without source information! Disable source reporting.\n");

It sounds you're disabling some hardware functionality. Better to say,
maybe:

Buggy hardware? Disable NMI source handling.

+ nmi_source_disabled = true;
+ return 0;
+ }
+
+ /*
+ * Per NMI source specification, there is no guarantee that a valid
+ * NMI vector is always delivered, even when the source specified
+ * one. It is software's responsibility to check all available NMI
+ * sources when bit 0 is set in the NMI source bitmap. i.e. we have

s/i.e./I.e.,/

+ * to call every handler as if we have no NMI source.

This comment is misleading, because you do skip NMI handlers with source
bits set in polling.

And add an empty line to ease review.

+ * On the other hand, if we do get non-zero vectors, we know exactly
+ * what the sources are. So we only call the handlers with the bit set.
+ */
+ if (source_bitmask & BIT(NMI_SOURCE_VEC_UNKNOWN)) {
+ pr_warn_ratelimited("NMI received with unknown source\n");

s/source/sources/

+ has_unknown_src = true;
+ }
+
+ rcu_read_lock();

Add an empty line.

+ /* Bit 0 is for unknown NMI sources, skip it. */

Put "vec = 1 " close to this comment.

+ for_each_set_bit_from(vec, &source_bitmask, NR_NMI_SOURCE_VECTORS) {
+ a = rcu_dereference(nmiaction_src_table[vec]);
+ if (!a) {
+ pr_warn_ratelimited("NMI received %d no handler", vec);

Use a better log message.

+ continue;
+ }

Empty line again.

+ handled += do_handle_nmi(a, regs, type);

Ditto.

+ /*
+ * Needs polling if unknown source bit is set, handled_mask is

^the

+ * used to tell the polling code which NMIs can be skipped.
+ */
+ if (has_unknown_src)
+ *handled_mask |= BIT(vec);
+ }

empty line please.

+ rcu_read_unlock();
+
+ return handled;
+}
+
static int nmi_handle(unsigned int type, struct pt_regs *regs)
{
struct nmi_desc *desc = nmi_to_desc(type);
+ unsigned long handled_mask = 0;
struct nmiaction *a;
int handled=0;
- rcu_read_lock();
+ /*
+ * Check if the NMI source handling is complete, otherwise polling is
+ * still required. handled_mask is non-zero if NMI source handling is
+ * partial due to unknown NMI sources.
+ */
+ handled = nmi_handle_src(type, regs, &handled_mask);
+ if (handled && !handled_mask)
+ return handled;
+ rcu_read_lock();

keep original empty lines around it.

/*
* NMIs are edge-triggered, which means if you have enough
* of them concurrently, you can lose some because only one
* can be latched at any given time. Walk the whole list
* to handle those situations.
*/
- list_for_each_entry_rcu(a, &desc->head, list)
+ list_for_each_entry_rcu(a, &desc->head, list) {
+ /* Skip NMIs handled earlier with source info */
+ if (BIT(a->source_vec) & handled_mask)
+ continue;
handled += do_handle_nmi(a, regs, type);
-
+ }
rcu_read_unlock();

keep original empty lines around it.

/* return total number of NMI events handled */