RE: [PATCH 2/2] x86/MCE/AMD: Skip creating kobjects with NULL names

From: Ghannam, Yazen
Date: Thu Aug 16 2018 - 14:46:43 EST


> -----Original Message-----
> From: linux-edac-owner@xxxxxxxxxxxxxxx <linux-edac-owner@xxxxxxxxxxxxxxx>
> On Behalf Of Borislav Petkov
> Sent: Wednesday, August 15, 2018 10:55 AM
> To: Ghannam, Yazen <Yazen.Ghannam@xxxxxxx>
> Cc: linux-edac@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> tony.luck@xxxxxxxxx; x86@xxxxxxxxxx
> Subject: Re: [PATCH 2/2] x86/MCE/AMD: Skip creating kobjects with NULL
> names
>
> On Thu, Aug 09, 2018 at 06:46:26PM +0000, Ghannam, Yazen wrote:
> > This patch makes it so that we don't fail init just because some banks don't
> > have names. The data caching we do is useful even if we fail to create sysfs
> > entries for some banks. The interrupt handler can work fine without a sysfs
> > entry for every bank. It seems like overkill to deallocate all the structures
> > and sysfs entries for all the banks just because we fail to create entries for
> > some banks that don't have names.
>
> Maybe I'm missing the big picture here but why, all of a sudden, are
> some banks without names? This clearly is new because it wasn't like
> that before, so what is it? Maybe you should explain the bigger picture
> first.
>

Yes, you're right. I'll update the commit message.

Before Fam17h, we went generations with only a few MCA banks and their
bank numbers and types were consistent. With Fam17h and SMCA we got a
whole new set of bank types and now these banks may not have the same
ordering between models. So we created names for the new banks and we
match the banks to a name based on values from the MCA_IPID register.

Future SMCA systems may have banks whose MCA_IPID is not known so we
don't match a name to them. This could happen if we have a minor revision
to a known bank, or if we have a completely new bank type.

> And if banks don't have names, then we should generate some for them
> instead. Because this code is already ugly and recursive - we certainly
> don't want to add more conditionals to it because that thing is a mess
> as it is now.
>

This is a good idea.

In the past, the sysfs entries were named something like "th_bank#" before
each bank was given a more descriptive name. We can go back to this format
as a fallback for unrecognized bank types. This will fix the issue with sysfs
failing and also the interface will still work on new hardware.

I can do this and update the commit message in a v2 patch.

> > In other words, I think we should decouple the interrupt handler from the
> > sysfs interface. The interface is nice to have but not necessary for the HW
> > and OS to handle threshold interrupts. If we do so, then new HW with new,
> > unnamed types will work with older versions of Linux.
>
> To tell you the truth, I'm not at all psyched about telling the future.
> We can try to be future-proof to a certain degree but this should not be
> the determining factor how we design things.
>
> But the aspect of decoupling sysfs representation from handler makes
> sense. It just needs to be done cleanly.
>

I agree. I just wanted to make a small change and fix this issue. I think there's
a lot we can do to clean up the thresholding code going forward.

Thanks,
Yazen