[PATCH] x86/mce: Restore MCA polling interval halving
From: Borislav Petkov
Date: Mon Apr 06 2026 - 18:49:50 EST
Ok,
finally. :-\
Pls run it to make sure it DTRT for you too.
Thx.
---
From: "Borislav Petkov (AMD)" <bp@xxxxxxxxx>
Date: Mon, 16 Mar 2026 16:12:00 +0100
Subject: [PATCH] x86/mce: Restore MCA polling interval halving
RongQing reported that the MCA polling interval doesn't halve when an
error gets logged. It was traced down to the commit in Fixes: because:
mce_timer_fn()
|-> mce_poll_banks()
|-> machine_check_poll()
|-> mce_log()
which will queue the work and return.
Now, back in mce_timer_fn():
/*
* Alert userspace if needed. If we logged an MCE, reduce the polling
* interval, otherwise increase the polling interval.
*/
if (mce_notify_irq())
<--- here we haven't ran the notifier chain yet so mce_need_notify is
not set yet so this won't hit and we won't halve the interval iv.
Now the notifier chain runs. mce_early_notifier() sets the bit, does
mce_notify_irq(), that clears the bit and then the notifier chain
a little later logs the error.
So this is a silly timing issue.
But, that's all unnecessary.
All it needs to happen here is, the "should we notify of a logged MCE"
mce_notify_irq() asks, should be simply a question to the mce gen pool:
"Are you empty?"
And that then turns into a simple yes or no answer and it all
JustWorks(tm).
So do that.
Fixes: 011d82611172 ("RAS: Add a Corrected Errors Collector")
Reported-by: Li RongQing <lirongqing@xxxxxxxxx>
Signed-off-by: Borislav Petkov (AMD) <bp@xxxxxxxxx>
Link: https://lore.kernel.org/r/20260112082747.2842-1-lirongqing@xxxxxxxxx
---
arch/x86/kernel/cpu/mce/core.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 8dd424ac5de8..d18db7d8d237 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -90,7 +90,6 @@ struct mca_config mca_cfg __read_mostly = {
};
static DEFINE_PER_CPU(struct mce_hw_err, hw_errs_seen);
-static unsigned long mce_need_notify;
/*
* MCA banks polled by the period polling timer for corrected events.
@@ -595,7 +594,7 @@ static bool mce_notify_irq(void)
/* Not more than two messages every minute */
static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);
- if (test_and_clear_bit(0, &mce_need_notify)) {
+ if (!mce_gen_pool_empty()) {
mce_work_trigger();
if (__ratelimit(&ratelimit))
@@ -618,10 +617,6 @@ static int mce_early_notifier(struct notifier_block *nb, unsigned long val,
/* Emit the trace record: */
trace_mce_record(err);
- set_bit(0, &mce_need_notify);
-
- mce_notify_irq();
-
return NOTIFY_DONE;
}
--
2.51.0
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette