Re: [PATCH 1/2] platform/x86/amd: Introduce AMD Address Translation Library

From: Yazen Ghannam
Date: Mon Aug 07 2023 - 16:44:18 EST


On 8/2/2023 2:55 PM, Yazen Ghannam wrote:
AMD Zen-based systems report memory errors through Machine Check banks
representing Unified Memory Controllers (UMCs). The address value
reported for DRAM ECC errors is a "normalized address" that is relative
to the UMC. This normalized address must be converted to a system
physical address to be usable by the OS.

Support for this address translation was introduced to the MCA subsystem
with Zen1 systems. The code was later moved to the AMD64 EDAC module,
since this was the only user of the code at the time.

However, there are uses for this translation outside of EDAC. The system
physical address can be used in MCA for preemptive page offlining as done
in some MCA notifier functions. Also, this translation is needed as the
basis of similar functionality needed for some CXL configurations on AMD
systems.

Introduce a common address translation library that can be used for
multiple subsystems including MCA, EDAC, and CXL.

Include support for UMC normalized to system physical address
translation for current CPU systems.

Future development to include:
- DF4.5 Non-power-of-2 interleaving modes.
- Heterogeneous CPU+GPU system support.
- CXL translation support.
- Caching of common intermediate values and results.
- Leverage UEFI PRM methods as alternate backends to existing native
code.

Signed-off-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
---
MAINTAINERS | 7 +
drivers/platform/x86/amd/Kconfig | 1 +
drivers/platform/x86/amd/Makefile | 1 +
drivers/platform/x86/amd/atl/Kconfig | 20 +
drivers/platform/x86/amd/atl/Makefile | 18 +
drivers/platform/x86/amd/atl/access.c | 107 ++++
drivers/platform/x86/amd/atl/core.c | 212 +++++++
drivers/platform/x86/amd/atl/dehash.c | 459 ++++++++++++++
drivers/platform/x86/amd/atl/denormalize.c | 644 ++++++++++++++++++++
drivers/platform/x86/amd/atl/internal.h | 307 ++++++++++
drivers/platform/x86/amd/atl/map.c | 659 +++++++++++++++++++++
drivers/platform/x86/amd/atl/reg_fields.h | 603 +++++++++++++++++++
drivers/platform/x86/amd/atl/system.c | 282 +++++++++
drivers/platform/x86/amd/atl/umc.c | 53 ++
include/linux/amd-atl.h | 18 +


Hi all,

I'd like to get feedback on the most appropriate place for this code.

I want to move this out of EDAC, since it's not really an EDAC feature. And it needs to be used by subsystems other than EDAC.

I thought x86 Platform Drivers, because the code is very platform-specific. And there are already some AMD platform drivers. But there isn't any platform control or management for this translation. It's just reading registers and calculating values. So it's not really a "platform driver" in the sense that it manages platform-specific behavior.

Another option is for this code to be in arch/x86/ras/. But I would like the option for this code to be built as a module, at least for debug and development. And I don't know that modules, nor platform-specific code, should be in arch/.

Currently, I think this could go in drivers/ras/. This address translation is needed for RAS use cases, so making it a part of "RAS Infrastructure" may make the most sense.

Boris, Tony, (and others) what do you think?

Thanks,
Yazen