Re: [PATCH v9 2/3] x86/mce: Add per-bank CMCI storm mitigation

From: Feng Tang
Date: Thu Oct 12 2023 - 02:05:44 EST


On Thu, Oct 12, 2023 at 01:36:41PM +0800, Feng Tang wrote:
> On Thu, Oct 12, 2023 at 01:23:54AM +0800, Luck, Tony wrote:
> > > IIRC, CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B was enabled in 0Day's
> > > kernel config for quite a while, to force each funtion's start
> > > address aligned on 64 bytes. Don't know if this has been changed
> > > recently.
> > >
> > > Also I noticed the patch introduce a new per-cpu variable 'storm_desc",
> > > if the function address is 64B aligned, then per-cpu data alignment
> > > may be related.
> >
> > If adding (removing, changing) per-cpu variables can cause 8% performance
> > changes, then maybe there needs to be some way to insulate the builds
> > from these side effects (as was done with the 64-byte function alignment).
> > I've no ideas on how to achieve that :-(

As for mitigation (how to reduce these kind of strange performance
changes), I have proposed a "selective isolation" in LPC 2021, in
page 15 of https://lpc.events/event/11/contributions/895/attachments/770/1603/Strange_kernel_performance_changes_lpc_2021.pdf

As kernel binary is compactly linked together, the alignment change
of text/data in earlier modules could affect other modules after them.
MCE module is much easier to hit these strange performance changes,
just because it sit in arch/ folder and get linked early in kernel
binary.

The idea of "selective isolation" is simple, by picking some modules
sparsely and enforce some alignment to function/data/per-cpu data,
so that they can act as fences/borders, separate kernel into multiple
capsules, and make the alignment changes only take effect inside
that specific capsule.

Any thoughts? thanks!

- Feng

> Philip helped to clarify that 0Day kernel build does enable the 64 bytes
> function address alignment.

[...]