Re: [PATCH v5 00/17] ARM Error Source Table V2 Support

From: Ruidong Tian
Date: Fri Jan 09 2026 - 07:41:36 EST




在 2026/1/9 18:34, Borislav Petkov 写道:
On Mon, Jan 05, 2026 at 05:12:25PM +0800, Ruidong Tian wrote:
What is a "RAS node"?
A RAS node is the hardware interface for error reporting and control,
consisting of one or more register sets (a collection of RAS records). It is
responsible for error logging and interrupt signaling[0].

OMG, one more meaning for the word "node". Because we're not ambiguous enough.

/facepalm

A single hardware component can feature multiple RAS nodes. For example, a
memory controller is treated as a "RAS device", where each memory channel
has its own RAS node. Interrupts generated by these nodes are typically
aggregated into a single interrupt line managed at the RAS device level.

Nomenclaturial tragedy, I'd say.

Comparison with x86 MCA:

RAS record ≈ MCA bank.
RAS node ≈ A set of MCA banks + CMCI on a core.

The key difference lies in uncore handling: x86 typically maps uncore errors
(like those from a memory controller) into core-based MCA banks. In
contrast, ARM requires uncore components to provide their own standalone RAS
nodes. When a component requires multiple such nodes, they are grouped and
managed as a "RAS device" in AEST driver.

[0]: https://developer.arm.com/documentation/ihi0100/latest

Yah, thanks for explaining.

The ATL is very AMD-specific. What does "conceptually similar" mean exactly?
By "conceptually similar," I mean that both ARM and AMD share the same
functional requirement: translating between a System Physical Address (SPA)
and a device-specific address (like a DRAM address) for RAS purposes.

The goal here is not to share the hardware-specific translation logic, but
to provide a unified interface (an abstraction layer). The actual
implementation of the translation remains entirely architecture-specific.

And why do we need an arch-overlapping unified interface?

You can just as well have aest_convert_la_to_spa() and none of that "unifying"
churn.

You're right, that would be much cleaner. I was trying too hard to keep the interface unified across architectures. I'll drop the unified interface and use a direct helper instead in next version. Thanks for the feedback!