Re: [PATCH 2/2] dt-bindings: arm: cpus: Add edac-enabled property
From: Marc Zyngier
Date: Thu Apr 10 2025 - 12:23:35 EST
On Thu, 10 Apr 2025 15:30:17 +0100,
"Tyler Hicks (Microsoft)" <code@xxxxxxxxxxx> wrote:
>
> On 2025-04-10 08:10:18, Marc Zyngier wrote:
> > On Thu, 10 Apr 2025 07:00:55 +0100,
> > Krzysztof Kozlowski <krzk@xxxxxxxxxx> wrote:
> > >
> > > On 10/04/2025 01:36, Vijay Balakrishna wrote:
> > > > From: Sascha Hauer <s.hauer@xxxxxxxxxxxxxx>
> > > >
> > > > Some ARM Cortex CPUs like the A53, A57 and A72 have Error Detection And
> > > > Correction (EDAC) support on their L1 and L2 caches. This is implemented
> > > > in implementation defined registers, so usage of this functionality is
> > > > not safe in virtualized environments or when EL3 already uses these
> > > > registers. This patch adds a edac-enabled flag which can be explicitly
> > > > set when EDAC can be used.
> > >
> > > Can't hypervisor tell you that?
> >
> > No, it can't. This is not an architecture feature, and KVM will gladly
> > inject an UNDEF exception if the guest tries to use this.
> >
> > Which is yet another reason why this whole exercise is futile.
>
> Hi Marc - could you clarify why this is futile for baremetal or were you just
> referring to virtualized environments?
This is futile in general. This sort of stuff only makes sense if you
can take useful action upon detecting an error, such as cache
scrubbing. Here, this is just telling you "bang, you're dead", without
any other recourse. You are not even sure you'll be able to actually
*run* this code. You cannot identify what the blast radius.
We have some other EDAC implementation for arm64 CPUs (XGene,
ThunderX), and they are all perfectly useless (I have them in my
collection of horrors). I know you are familiar enough with the RAS
architecture to appreciate the difference with a contemporary
implementation that would actually do the right thing.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.