Re: [PATCH 1/2] arm64: errata: Work around AmpereOne's erratum AC03_CPU_36

From: Oliver Upton
Date: Tue Apr 15 2025 - 14:13:13 EST


On Tue, Apr 15, 2025 at 10:30:36AM -0700, D Scott Phillips wrote:
> Oliver Upton <oliver.upton@xxxxxxxxx> writes:
> > On Tue, Apr 15, 2025 at 08:47:10AM -0700, D Scott Phillips wrote:
> >> AC03_CPU_36 can cause asynchronous exceptions to be routed to the wrong
> >> exception level if an async exception coincides with an update to the
> >> controls for the target exception level in HCR_EL2. On affected
> >> machines, always do writes to HCR_EL2 with async exceptions blocked.
> >>
> >> Signed-off-by: D Scott Phillips <scott@xxxxxxxxxxxxxxxxxxxxxx>
> >> ---
> >> arch/arm64/Kconfig | 17 +++++++++++++++++
> >> arch/arm64/include/asm/sysreg.h | 18 ++++++++++++++++--
> >> arch/arm64/kernel/cpu_errata.c | 14 ++++++++++++++
> >> arch/arm64/tools/cpucaps | 1 +
> >> 4 files changed, 48 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> >> index a182295e6f08b..e5fd87446a3b8 100644
> >> --- a/arch/arm64/Kconfig
> >> +++ b/arch/arm64/Kconfig
> >> @@ -445,6 +445,23 @@ menu "Kernel Features"
> >>
> >> menu "ARM errata workarounds via the alternatives framework"
> >>
> >> +config AMPERE_ERRATUM_AC03_CPU_36
> >> + bool "AmpereOne: AC03_CPU_36: CPU can take an invalid exception, if an asynchronous exception to EL2 occurs while EL2 software is changing the EL2 exception controls."
> >> + default y
> >> + help
> >> + This option adds an alternative code sequence to work around Ampere
> >> + errata AC03_CPU_36 on AmpereOne.
> >> +
> >> + If an async exception happens at the same time as an update to the
> >> + controls for the target EL for async exceptions, an exception can be
> >> + delivered to the wrong EL. For example, an EL may be routed from EL2
> >> + to EL1.
> >> +
> >> + The workaround masks all asynchronous exception types when writing
> >> + to HCR_EL2.
> >> +
> >> + If unsure, say Y.
> >> +
> >> config AMPERE_ERRATUM_AC03_CPU_38
> >> bool "AmpereOne: AC03_CPU_38: Certain bits in the Virtualization Translation Control Register and Translation Control Registers do not follow RES0 semantics"
> >> default y
> >> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> >> index 2639d3633073d..e7781f7e7f7a7 100644
> >> --- a/arch/arm64/include/asm/sysreg.h
> >> +++ b/arch/arm64/include/asm/sysreg.h
> >> @@ -1136,14 +1136,28 @@
> >> __val; \
> >> })
> >>
> >> +#define __sysreg_is_hcr_el2(r) \
> >> + (__builtin_strcmp("hcr_el2", __stringify(r)) == 0)
> >
> > This looks fragile. What about:
> >
> > write_sysreg(hcr, HCR_EL2);
> >
> > or:
> >
> > write_sysreg_s(hcr, SYS_HCR_EL2);
>
> I had also thought about changing the users of write_sysreg(..hcr_el2)
> to some new function write_hcr_el2() or something, but I guess that
> would have the same fragility. Any suggestions on a better way? Trying
> harder with the string stuff, or do something totally else?

I think the least bad approach would be to convert to HCR-specific
accessors. It's the most likely to encourage folks to respect the errata
mitigation + keeps the ugliness out of unrelated common helpers.

Thanks,
Oliver