Re: [PATCH v2 8/8] x86/MCE/AMD Support new memory interleaving modes during address translation

From: Yazen Ghannam
Date: Wed Sep 23 2020 - 12:25:29 EST


On Wed, Sep 23, 2020 at 10:20:39AM +0200, Borislav Petkov wrote:
> On Thu, Sep 03, 2020 at 08:01:44PM +0000, Yazen Ghannam wrote:
> > From: Muralidhara M K <muralidhara.mk@xxxxxxx>
> >
> > Add support for new memory interleaving modes used in current AMD systems.
> >
> > Check if the system is using a current Data Fabric version or a legacy
> > version as some bit and register definitions have changed.
> >
> > Tested on AMD reference platforms with the following memory interleaving
> > options.
> >
> > Naples
> > - None
> > - Channel
> > - Die
> > - Socket
> >
> > Rome (NPS = Nodes per Socket)
> > - None
> > - NPS0
> > - NPS1
> > - NPS2
> > - NPS4
> >
> > The fixes tag refers to the commit that allows amd64_edac_mod to load on
> > Rome systems.
>
> Err, why? This is adding new stuff to an address translation function.
> How does that fix amd64_edac loading on Rome?
>
> > The module may report an incorrect system addresses on
> > Rome systems depending on the interleaving option used.
>
> That doesn't stop it from loading, sorry.
>

Okay, no problem.

> Now, before you guys do any new features, I'd like you to split this
> humongous function umc_normaddr_to_sysaddr() logically into separate
> helpers and each helper does exactly one thing and one thing only.
>
> Then use a verb in its name: umc_translate_normaddr_to_sysaddr() or so.
>

Okay, will do.

> Also, Yazen, remind me again pls why isn't this function in
> drivers/edac/amd64_edac.c, where it is needed?
>
> If the reason is not valid anymore, let's move it there before splitting
> so that it doesn't bloat the core code.
>

I don't remember the original reason, and I was recently asked about
this code living in a module. I did some looking after this ask, and I
found that we should be using this translation to get a proper value for
the memory error notifiers to use. So I think we still need to use this
function some way with the core code even if the EDAC interface isn't
used.

I think this set can be split up.

1) Set with patches 1-3 fixed up to use cpu_die_id.
2) Set with the address translation updates.
a) Move umc_normaddr_to_sysaddr() into a new module under EDAC.
b) Hook the new module into amd64_edac.c where it's used today.
c) Refactor the code as you suggested above.
d) Add the new features.
3) New set that sets up a proper notifier for the address translation.
a) Unhook the new module from amd64_edac.c.
b) Register a notifer that runs before any notifiers that operate on
memory errors.
c) Find a way to pass the translated address through the chain
without losing the original value.

What do you think?

Thanks,
Yazen