Re: [PATCH] x86/MCE/AMD: Decrement threshold_bank refcount when removing threshold blocks

From: Yazen Ghannam
Date: Wed Oct 26 2022 - 15:46:01 EST


On Wed, Oct 26, 2022 at 08:29:51PM +0200, Borislav Petkov wrote:
> On Wed, Oct 26, 2022 at 03:39:15PM +0000, Yazen Ghannam wrote:
> > What's the issue with my original patch?
>
> Do you see it?
>
> > @@ -1258,10 +1258,10 @@ static void __threshold_remove_blocks(struct threshold_bank *b)
> > struct threshold_block *pos = NULL;
> > struct threshold_block *tmp = NULL;
> >
> > - kobject_del(b->kobj);
> > + kobject_put(b->kobj);
> >
> > list_for_each_entry_safe(pos, tmp, &b->blocks->miscj, miscj)
> > - kobject_del(&pos->kobj);
> > + kobject_put(b->kobj);
>
> You're basically putting the parent as many times as there are elements
> on the ->miscj list.
>
> Basically what Greg doesn't like.
>
> Him and I need to talk it over first whether my gross hack of grafting
> the bank4 kobject hierarchy from CPU0 onto the other CPUs on the node is
> even viable so stay tuned...
>
> > I think this is the simplest way to fix the current implementation.
> > But we should probably get rid of this kobject sharing idea in light
> > of Greg's comments.
>
> You said it. :)
>
> Or maybe do a better one.

Right, so can we do the following two things?

1) Apply the patch I submitted as a simple fix/workaround for the presented
symptom. I tried to keep it small and well described to be a stable backport.
Obviously I wrote it without knowing the shared kobject behavior isn't ideal.

2) Address the shared kobject thing.
Here are some options:
a. Only set up the thresholding kobject on a single CPU per "AMD Node".
Technically MCA Bank 4 is "shared" on legacy systems. But AFAICT from
looking at old BKDG docs, in practice only the "Node Base Core" can access
the registers. This behavior is controlled by a bit in NB which BIOS is
supposed to set. Maybe some BIOSes don't do this, but I think that's a
"broken BIOS on legacy system" issue if so.
b. Disable the MCA Thresholding interface for Families before 0x17. This is
an undocumented interface, and I don't know if anyone is using it on older
systems. The issue we're discussing here started because of a splat during
suspend/resume/CPU hotplug. In disable_err_thresholding(), we disable MCA
Thresholding for bank 4 on Family 15h, so there's some precedent.
c. Do nothing at the moment. I *really* want to clean up the MCA
Thresholding interface, and the shared kobject thing may get resolved in
that.

What do you think?

Thanks,
Yazen