RE: [PATCH 2/2] x86/MCE/AMD: Skip creating kobjects with NULL names

From: Ghannam, Yazen
Date: Thu Aug 09 2018 - 14:46:42 EST


> -----Original Message-----
> From: linux-edac-owner@xxxxxxxxxxxxxxx <linux-edac-owner@xxxxxxxxxxxxxxx>
> On Behalf Of Borislav Petkov
> Sent: Thursday, August 9, 2018 11:18 AM
> To: Ghannam, Yazen <Yazen.Ghannam@xxxxxxx>
> Cc: linux-edac@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> tony.luck@xxxxxxxxx; x86@xxxxxxxxxx
> Subject: Re: [PATCH 2/2] x86/MCE/AMD: Skip creating kobjects with NULL
> names
>
> On Thu, Aug 09, 2018 at 09:08:34AM -0500, Yazen Ghannam wrote:
> > From: Yazen Ghannam <yazen.ghannam@xxxxxxx>
> >
> > During mce_threshold_create_device() data structures are allocated for
> > each CPUs MCA banks and thresholding blocks. These data structures are
> > used to save information related to AMD's MCA Error Thresholding
> > feature. The structures are used in the thresholding interrupt handler,
> > and they are exposed to the user through sysfs. The sysfs interface has
> > user-friendly names for each bank.
> >
> > However, errors in mce_threshold_create_device() will cause all the data
> > structures to be deallocated. This will break the thresholding interrupt
> > handler since it depends on these structures.
>
> Same argument as before: if our init fails in some fashion, we should
> not be running the interrupt handler.
>

This patch makes it so that we don't fail init just because some banks don't
have names. The data caching we do is useful even if we fail to create sysfs
entries for some banks. The interrupt handler can work fine without a sysfs
entry for every bank. It seems like overkill to deallocate all the structures
and sysfs entries for all the banks just because we fail to create entries for
some banks that don't have names.

In other words, I think we should decouple the interrupt handler from the
sysfs interface. The interface is nice to have but not necessary for the HW
and OS to handle threshold interrupts. If we do so, then new HW with new,
unnamed types will work with older versions of Linux. We can then add the
new type names without having to backport to fix the interrupt handler.

Thanks,
Yazen