答复: 答复: 答复: [外部邮件] Re: [PATCH] x86/mce: Fix timer interval adjustment after logging a MCE event

From: Li,Rongqing(ACG CCN)

Date: Fri Mar 06 2026 - 20:20:00 EST




> -----邮件原件-----
> 发件人: Borislav Petkov <bp@xxxxxxxxx>
> 发送时间: 2026年3月6日 23:29
> 收件人: Li,Rongqing(ACG CCN) <lirongqing@xxxxxxxxx>
> 抄送: Luck, Tony <tony.luck@xxxxxxxxx>; Nikolay Borisov
> <nik.borisov@xxxxxxxx>; Thomas Gleixner <tglx@xxxxxxxxxx>; Ingo Molnar
> <mingo@xxxxxxxxxx>; Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>;
> x86@xxxxxxxxxx; H . Peter Anvin <hpa@xxxxxxxxx>; Yazen Ghannam
> <yazen.ghannam@xxxxxxx>; Zhuo, Qiuxu <qiuxu.zhuo@xxxxxxxxx>; Avadhut
> Naik <avadhut.naik@xxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx;
> linux-edac@xxxxxxxxxxxxxxx
> 主题: Re: 答复: 答复: [外部邮件] Re: [PATCH] x86/mce: Fix timer interval
> adjustment after logging a MCE event
>
> On Fri, Mar 06, 2026 at 02:38:29PM +0000, Li,Rongqing(ACG CCN) wrote:
> > We anticipate potential UE issues by analyzing the volume and
> > frequency of collected CE reports, enabling us to perform proactive
> > task offloading and machine maintenance. However, inaccuracies in the
> > collected data are currently undermining this approach, making it
> > difficult to reliably predict UE incidents.
>
> This looks like a canned AI reply to me.
>
> I think you wanna say, you want to get *every* single error logged. Yes?
>
> So you want to be able to decrease the polling interval if necessary?
>
> Do you also disable the RAS CEC?
>

CEC may not work in some cases. For example, when QEMU uses vDPA devices, all of QEMU's memory is pinned and cannot be offlined. hugetlbfs has only recently gained support for offline operations; offlining hugetlbfs can cause issues. The kernel provides an interface to disable offline, as referenced in the patch "mm/memory-failure: userspace controls soft-offlining pages."
And evaluating Correctable Errors (CE) in userspace, more parameters can be considered, such as memory manufacturer, batch, and frequency, to improve prediction accuracy


[Li,Rongqing]


> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette