Re: [PATCH] x86/mce: Restore MCA polling interval halving
From: Borislav Petkov
Date: Wed Apr 15 2026 - 15:28:26 EST
On Tue, Apr 14, 2026 at 03:22:23PM -0700, Luck, Tony wrote:
> Ran my own test. RAS_CEC disabled. Booted with mce=no_cmci injected a
> corrected error every twenty seconds. Added pr_info() to mce_timer_fn()
> to say which CPUs were doubling or halving interval.
Right, we still need some sort of a feedback that we've logged an error.
> Results:
>
> I did see some "Machine check events logged" console messages.
Right, mce_timer_fn().
Not sure that is the right place tho. We want to issue that printk the moment
we log an MCE, perhaps in the early notifier or so, where mce_notify_irq()
was.
> The debug messages are "interesting". Polling timers on CPUs aren't
> synchronized, so I got random bursts of debug messages where some
> CPUs found an error and halved their interval, while others didn't
> see an error and doubled their interval. The machine check banks for
> memory corrected errors are socket scoped, so when an error is logged
> whichever CPU on the socket polls next will find the error.
>
> Both mcelog and EDAC were invoked on the mce decode chain and logged
> errors OK.
>
> When I stopped injecting, all the CPUs doubled back up to maximum
> polling interval.
>
> Summary: This is working as well as can be expected given the shared
> scope of the machine check banks. If Linux were to understand the
> scope of machine check banks it might designate a single CPU in
> that scope to do the polling. But Intel doesn't make it easy to derive
> the scope. In any case, the common case is CMCI enabled.
Thanks for the testing - much appreciated.
One aspect remained unanswered:
mce_notify_irq -> mce_work_trigger -> schedule_work(&mce_trigger_work); ->
mce_do_trigger ->
call_usermodehelper(mce_helper, mce_helper_argv, NULL, UMH_NO_WAIT);
Is that thing still used?
If so, what is the use case? Is per-chance that mce_helper the userspace
mcelog tool which the kernel calls here on a MCE?
Or?
Do we need that still?
If not, ripping that out would be a nice, additional simplification.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette