Re: [PATCH] x86/MCE/AMD: Decrement threshold_bank refcount when removing threshold blocks

From: Greg KH
Date: Wed Oct 26 2022 - 08:03:18 EST


On Wed, Oct 26, 2022 at 12:16:40PM +0200, Borislav Petkov wrote:
> On Wed, Jun 15, 2022 at 01:51:47PM +0000, Yazen Ghannam wrote:
> > Yes, I believe that's true based on code inspection. But I'm not aware of any
> > reported issues in this area before the commit listed above. So I decided to
> > switch the Fixes tag from what I had before (shown below). I can switch it
> > back if you think that's best.
> >
> > Fixes: 019f34fccfd5 ("x86, MCE, AMD: Move shared bank to node descriptor")
>
> Yeah, that is probably the culprit. I finally got to this and am able to
> repro on my F10h box.
>
> Here's what I think the fix should be, Greg, please check this
> for no-nos, especially for doing a kobject_put() on the parent in
> remove_shared_bank_kobjects(). But that is basically the reverse
> operation of the kobject_add() I'm doing when sharing the bank, more to
> that below.
>
> I wonder why we see this now - maybe the kobject reference counting got
> changed since then...
>
> Anyway, thoughts?
>
> ---
> From: Borislav Petkov <bp@xxxxxxx>
>
> x86/MCE/AMD: Correctly drop shared bank references
>
> Old AMD machines have a shared MCA bank 4 which reports northbridge
> error types. That bank has a bunch of controls which are exposed this
> way in sysfs on CPU0:
>
> /sys/devices/system/machinecheck/machinecheck0/northbridge/
> ├── dram
> │   ├── error_count
> │   ├── interrupt_enable
> │   └── threshold_limit
> ├── ht_links
> │   ├── error_count
> │   ├── interrupt_enable
> │   └── threshold_limit
> └── l3_cache
> ├── error_count
> ├── interrupt_enable
> └── threshold_limit
>
> In order to expose the exact same controls - the bank is shared
> between all CPUs - threshold_create_bank() reuses the bank pointer and
> kobject_add()s it to the parent of the other CPUs:
>
> mce: threshold_create_bank: CPU1, yes, use it, kref: 4, parent_kref: 3, name: northbridge
> mce: threshold_create_bank: CPU1, inc cpus: 2, bank ref: 4
> mce: __threshold_add_blocks: entry, kobj: 0xffff888100adb218, parent: 0xffff888100c10c00 ref: 1, parent_kref: 4, name: dram
> mce: __threshold_add_blocks: misc, kobj: 0xffff888100adb418, parent: 0xffff888100c10c00, kref: 1, parent_kref: 6, name: l3_cache
> mce: __threshold_add_blocks: misc, kobj: 0xffff888100adb318, parent: 0xffff888100c10c00, kref: 1, parent_kref: 7, name: ht_links
> ...
>
> kobject_add() does a kobject_get() on the parent for each sysfs file it
> adds.
>
> Therefore, in order to unwind the same setup work when the CPU goes
> offline and the bank *references* only are being removed - the other
> CPUs still share it - do a kobject_put() on the parent.

Eeek, no!

You can't decrement the reference on the parent, that could cause you to
get dropped.

And you can not reuse kobjects, is that the issue here? When you are
done with one, you have to delete it. Then create a new one.

No need to move anything around, just destory it all and then add new
ones.

> Rename things more properly while at it and add comments.
>
> Signed-off-by: Borislav Petkov <bp@xxxxxxx>
>
> ---
>
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index 1c87501e0fa3..b2bdee9e0bae 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -1241,31 +1241,40 @@ static void threshold_block_release(struct kobject *kobj)
> kfree(to_block(kobj));
> }
>
> +/*
> + * Drop refcounts and delete list heads in order to free the memory.
> + */
> static void deallocate_threshold_blocks(struct threshold_bank *bank)
> {
> + struct list_head *head = &bank->blocks->miscj;
> struct threshold_block *pos, *tmp;
>
> - list_for_each_entry_safe(pos, tmp, &bank->blocks->miscj, miscj) {
> - list_del(&pos->miscj);
> + list_for_each_entry_safe(pos, tmp, head, miscj) {
> kobject_put(&pos->kobj);
> + list_del(&pos->miscj);
> }
>
> kobject_put(&bank->blocks->kobj);
> }
>
> -static void __threshold_remove_blocks(struct threshold_bank *b)
> +/*
> + * Only put the parent kobject of each block. The inverse of kobject_add()
> + * above in threshold_create_bank().
> + */
> +static void remove_shared_bank_kobjects(struct threshold_bank *bank)
> {
> - struct threshold_block *pos = NULL;
> - struct threshold_block *tmp = NULL;
> + struct list_head *head = &bank->blocks->miscj;
> + struct threshold_block *pos, *tmp;
>
> - kobject_del(b->kobj);
> + list_for_each_entry_safe(pos, tmp, head, miscj)
> + kobject_put(pos->kobj.parent);

No, please don't do that.

>
> - list_for_each_entry_safe(pos, tmp, &b->blocks->miscj, miscj)
> - kobject_del(&pos->kobj);
> + kobject_put(bank->kobj);

What changed to cause problems? the kobject reference logic hasn't
changed, was it some topology stuff?

thanks,

greg k-h