Re: [PATCH] x86/mce: Dynamically size space for machine check records

From: Tony Luck
Date: Thu Feb 29 2024 - 12:27:04 EST


On Thu, Feb 29, 2024 at 12:42:38AM -0600, Naik, Avadhut wrote:
> Hi,
>
> On 2/28/2024 17:14, Tony Luck wrote:
> > Systems with a large number of CPUs may generate a large
> > number of machine check records when things go seriously
> > wrong. But Linux has a fixed buffer that can only capture
> > a few dozen errors.
> >
> > Allocate space based on the number of CPUs (with a minimum
> > value based on the historical fixed buffer that could store
> > 80 records).
> >
> > Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>
> > ---
> >
> > Discussion earlier concluded with the realization that it is
> > safe to dynamically allocate the mce_evt_pool at boot time.
> > So here's a patch to do that. Scaling algorithm here is a
> > simple linear "4 records per possible CPU" with a minimum
> > of 80 to match the legacy behavior. I'm open to other
> > suggestions.
> >
> > Note that I threw in a "+1" to the return from ilog2() when
> > calling gen_pool_create(). From reading code, and running
> > some tests, it appears that the min_alloc_order argument
> > needs to be large enough to allocate one of the mce_evt_llist
> > structures.
> >
> > Some other gen_pool users in Linux may also need this "+1".
> >
>
> Somewhat confused here. Weren't we also exploring ways to avoid
> duplicate records from being added to the genpool? Has something
> changed in that regard?

I'm going to cover this in the reply to Boris.

> > + mce_numrecords = max(80, num_possible_cpus() * 4);
> > + mce_poolsz = mce_numrecords * (1 << order);
> > + mce_pool = kmalloc(mce_poolsz, GFP_KERNEL);
>
> To err on the side of caution, wouldn't kzalloc() be a safer choice here?

Seems like too much caution. When mce_gen_pool_add() allocates
an entry from the pool it does:

memcpy(&node->mce, mce, sizeof(*mce));
llist_add(&node->llnode, &mce_event_llist);

between those two lines, every field in the struct mce_evt_llist
is written (including any holes in the struct mce part of the structure).

-Tony