Re: [PATCH v2 2/3] x86/mce: Move message printing from mce_notify_irq to mce_early_notifier()

From: Borislav Petkov
Date: Tue Feb 25 2025 - 08:14:42 EST


On Mon, Feb 10, 2025 at 05:47:05PM +0200, Nikolay Borisov wrote:
> Informing the user that an MCE has been logged from mce_notify_irq() is
> somewhat misleading because whether the MCE has been logged actually
> depends on whether CONFIG_X86_MCELOG_LEGACY is turned on or not.

That text needs update in light of what we talked about when looking at patch
1...

> Furthermore it was reported that actually having a message triggered
> when an MCE is generated can be helpful in certain scenarios.

That's too vague - needs proper justification.

> Improve the situation by lifting the printing to the generic
> mce_early_notifier() as it's executed always and is independent of any
> compile-time option.

Meh.

> Link: https://lore.kernel.org/all/CY8PR11MB7134D97F82DC001AE009637889E32@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/

Ah, there's the justification. I guess...

Just don't put "customers" in the commit message.

> Signed-off-by: Nikolay Borisov <nik.borisov@xxxxxxxx>
> ---
> arch/x86/kernel/cpu/mce/core.c | 12 +++++-------
> 1 file changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 89625ff79c3b..d55b1903fde6 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -591,15 +591,8 @@ EXPORT_SYMBOL_GPL(mce_is_correctable);
> */
> static int mce_notify_irq(void)
> {
> - /* Not more than two messages every minute */
> - static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);
> -
> if (test_and_clear_bit(0, &mce_need_notify)) {
> mce_work_trigger();
> -
> - if (__ratelimit(&ratelimit))
> - pr_info(HW_ERR "Machine check events logged\n");
> -
> return 1;
> }
>
> @@ -609,6 +602,8 @@ static int mce_notify_irq(void)
> static int mce_early_notifier(struct notifier_block *nb, unsigned long val,
> void *data)
> {
> + /* Not more than two messages every minute */
> + static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);
> struct mce_hw_err *err = to_mce_hw_err(data);
>
> if (!err)
> @@ -619,6 +614,9 @@ static int mce_early_notifier(struct notifier_block *nb, unsigned long val,
>
> set_bit(0, &mce_need_notify);
>
> + if (__ratelimit(&ratelimit))
> + pr_info(HW_ERR "Machine check event detected\n");

Well, the previous "logged" was correct.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette