Re: [PATCH] x86/mce: Dynamically size space for machine check records

From: Tony Luck
Date: Thu Feb 29 2024 - 12:22:32 EST


On Wed, Feb 28, 2024 at 05:56:26PM -0800, Sohil Mehta wrote:
> A few other nits.
>
> On 2/28/2024 3:14 PM, Tony Luck wrote:
> > diff --git a/arch/x86/kernel/cpu/mce/genpool.c b/arch/x86/kernel/cpu/mce/genpool.c
> > index fbe8b61c3413..a1f0a8f29cf5 100644
> > --- a/arch/x86/kernel/cpu/mce/genpool.c
> > +++ b/arch/x86/kernel/cpu/mce/genpool.c
> > @@ -16,14 +16,13 @@
> > * used to save error information organized in a lock-less list.
> > *
> > * This memory pool is only to be used to save MCE records in MCE context.
> > - * MCE events are rare, so a fixed size memory pool should be enough. Use
> > - * 2 pages to save MCE events for now (~80 MCE records at most).
> > + * MCE events are rare, so a fixed size memory pool should be enough.
> > + * Allocate on a sliding scale based on number of CPUs.
> > */
> > -#define MCE_POOLSZ (2 * PAGE_SIZE)
> > +#define MCE_MIN_ENTRIES 80
> >
> > static struct gen_pool *mce_evt_pool;
> > static LLIST_HEAD(mce_event_llist);
> > -static char gen_pool_buf[MCE_POOLSZ];
> >
> > /*
> > * Compare the record "t" with each of the records on list "l" to see if
> > @@ -118,14 +117,25 @@ int mce_gen_pool_add(struct mce *mce)
> >
> > static int mce_gen_pool_create(void)
> > {
> > + int mce_numrecords, mce_poolsz;
>
> Should order be also declared in this line? That way we can have all the
> uninitialized 'int's together.

Sure. Even with the addition of "order" the line is still short enough.

> > struct gen_pool *tmpp;
> > int ret = -ENOMEM;
> > + void *mce_pool;
> > + int order;
> >
> > - tmpp = gen_pool_create(ilog2(sizeof(struct mce_evt_llist)), -1);
> > + order = ilog2(sizeof(struct mce_evt_llist)) + 1;
>
> I didn't exactly understand why a +1 is needed here. Do you have a
> pointer to somewhere to help understand this?
>
> Also, I think, a comment on top might be useful since this isn't obvious.

gen_pool works in power-of-two blocks. The user must pick a specific
size with the gen_pool_create() call. Internally the gen_pool code uses
a bitmap to track which blocks in the pool are allocated/free.

Looking at this specific case, sizeof(struct mce_evt_llist) is 136. So
the original version of this code picks order 7 to allocate in 128 byte
units. But this means that every allocation of a mce_evt_llist will take
two 128-byte blocks.

Net result is that the comment at the top of arch/x86/kernel/cpu/mce/genpool.c
that two pages are enough for ~80 records was wrong when written. At
that point struct mce_evt_llist was below 128, so order was 6, and each
allocation took two blocks. So two pages = 8192 bytes divided by (2 * 64)
results in 64 possible allocations.

But over time Intel and AMD added to the structure. So the current math
comes out at just 32 allocations before the pool is out of space.

Yazen provided the right answer for this. Change to use order_base_2()

> > + tmpp = gen_pool_create(order, -1);
> > if (!tmpp)
> > goto out;
> >
> > - ret = gen_pool_add(tmpp, (unsigned long)gen_pool_buf, MCE_POOLSZ, -1);
> > + mce_numrecords = max(80, num_possible_cpus() * 4);
>
> How about using MCE_MIN_ENTRIES here?

Oops. I meant to do that when I added the #define!

I've also added a "#define MCE_PER_CPU 4" instead of
a raw "4" in that expression.

-Tony