Re: [PATCH v2 1/2] RAS: Introduce AMD Address Translation Library

From: Borislav Petkov
Date: Wed Oct 25 2023 - 06:54:12 EST


On Mon, Oct 16, 2023 at 09:47:09AM -0400, Yazen Ghannam wrote:
> Of course not! :P
>
> I do mean "Supported" though. From the top of MAINTAINERS file:
>
> S: *Status*, one of the following:
> Supported: Someone is actually paid to look after this.

Whatever - I'm not even going to ask why the distinction is being made.
:-\

>
> >> +#define DF_BROADCAST 0xFF
> >> +
> >> +#define DF_FICAA_INST_EN BIT(0)
> >> +#define DF_FICAA_REG_NUM GENMASK(10, 1)
> >> +#define DF_FICAA_FUNC_NUM GENMASK(13, 11)
> >> +#define DF_FICAA_INST_ID GENMASK(23, 16)
> >> +
> >> +/* Register field changed in new systems. */
> >
> > I don't understand that comment.
>
> I'll make it more explicit.
>
> The "REG_NUM" field changed. Please note the slightly different
> bitmasks.

Sure, but that belongs in the changelog - not in a static comment. "new
systems" turns into old systems after a couple of years. :)

> Fair. It took some getting used to, but I've come to prefer the bitops.
> I'd like to keep them if you don't mind.

I guess. They haven't pissed me off that much yet and since you're going
to be staring at that thing...

> Is this so it loads independently?

Yes.

> I'm thinking it should only load as a dependency for other modules.

Then you can express that dependency in those other modules' Kconfig
stanzas. No need to check vendor here.

> If it fails to load, wouldn't modules that depend on it fail to load?

So if this fails loading, then you need to return an error from the
loading routine and the other modules should not load.

If the other modules should still load, then you need to make the
translation lib optional and use the module device table thing so that
the translation lib can exist independently.

However, then you need to handle the case where the other module calls
into the translation lib and at exactly the same time, that library
module is removed. There must be a way to prevent this in the module
code, ref counting perhaps, but you'll have to check how exactly.

> Yes. The intention is to allow any code to use this "library" including
> arch code like MCA.

Ah, then it should be in asm/atl.h

> Genoa is a public name for a particular Server model group. And the
> quirk applies to that group. It doesn't apply to other Zen4 systems like
> Client models, etc.

So Zen4 server. Still better to have the generation and hw type than
some italian name which we will forget.

> Sure, but I don't understand.
>
> Should these be moved to edac.rst? This code isn't part of EDAC.
>
> Or are you suggesting that this new "library" should have a
> Documentation/ entry?

Documentation/ras/ I guess?

> No, PA 0 is a valid address. The physical memory map (at least on x86)
> starts at 0.
>
> We can still get hardware errors for address 0 even though it's part of
> a reserved space. These could be found by patrol scrubbers, etc.

Oh fun.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette