[PATCH] x86/mce: Restore MCA polling interval halving

From: Borislav Petkov

Date: Mon Apr 06 2026 - 18:49:50 EST


Ok,

finally. :-\

Pls run it to make sure it DTRT for you too.

Thx.

---
From: "Borislav Petkov (AMD)" <bp@xxxxxxxxx>
Date: Mon, 16 Mar 2026 16:12:00 +0100
Subject: [PATCH] x86/mce: Restore MCA polling interval halving

RongQing reported that the MCA polling interval doesn't halve when an
error gets logged. It was traced down to the commit in Fixes: because:

mce_timer_fn()
|-> mce_poll_banks()
|-> machine_check_poll()
|-> mce_log()

which will queue the work and return.

Now, back in mce_timer_fn():

/*
* Alert userspace if needed. If we logged an MCE, reduce the polling
* interval, otherwise increase the polling interval.
*/
if (mce_notify_irq())

<--- here we haven't ran the notifier chain yet so mce_need_notify is
not set yet so this won't hit and we won't halve the interval iv.

Now the notifier chain runs. mce_early_notifier() sets the bit, does
mce_notify_irq(), that clears the bit and then the notifier chain
a little later logs the error.

So this is a silly timing issue.

But, that's all unnecessary.

All it needs to happen here is, the "should we notify of a logged MCE"
mce_notify_irq() asks, should be simply a question to the mce gen pool:
"Are you empty?"

And that then turns into a simple yes or no answer and it all
JustWorks(tm).

So do that.

Fixes: 011d82611172 ("RAS: Add a Corrected Errors Collector")
Reported-by: Li RongQing <lirongqing@xxxxxxxxx>
Signed-off-by: Borislav Petkov (AMD) <bp@xxxxxxxxx>
Link: https://lore.kernel.org/r/20260112082747.2842-1-lirongqing@xxxxxxxxx
---
arch/x86/kernel/cpu/mce/core.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 8dd424ac5de8..d18db7d8d237 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -90,7 +90,6 @@ struct mca_config mca_cfg __read_mostly = {
};

static DEFINE_PER_CPU(struct mce_hw_err, hw_errs_seen);
-static unsigned long mce_need_notify;

/*
* MCA banks polled by the period polling timer for corrected events.
@@ -595,7 +594,7 @@ static bool mce_notify_irq(void)
/* Not more than two messages every minute */
static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);

- if (test_and_clear_bit(0, &mce_need_notify)) {
+ if (!mce_gen_pool_empty()) {
mce_work_trigger();

if (__ratelimit(&ratelimit))
@@ -618,10 +617,6 @@ static int mce_early_notifier(struct notifier_block *nb, unsigned long val,
/* Emit the trace record: */
trace_mce_record(err);

- set_bit(0, &mce_need_notify);
-
- mce_notify_irq();
-
return NOTIFY_DONE;
}

--
2.51.0


--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette