Re: [RFC PATCH] KVM: arm64: Workaround for Ampere AC03_CPU_36 (exception taken to an incorrect EL)

From: Oliver Upton
Date: Sat Jan 06 2024 - 12:50:42 EST


On Sat, Jan 06, 2024 at 12:13:09PM +0000, Marc Zyngier wrote:

[...]

> > From 265cb193190c13c651d8e008d34d1d18505d4804 Mon Sep 17 00:00:00 2001
> > From: Oliver Upton <oliver.upton@xxxxxxxxx>
> > Date: Fri, 5 Jan 2024 23:18:14 +0000
> > Subject: [PATCH] KVM: arm64: Mitigate AmpereOne erratum AC03_CPU_36
> >
> > The AmpereOne design suffers from an erratum where if an asynchronous
> > exception arrives while EL2 is modifying hypervisor exception controls
> > (i.e. HCR_EL2, SCTLR_EL2) the PE may take an invalid exception to
> > another EL.
>
> Same questions about SCTLR_EL2 and the notion of "another EL".

I've got the same questions :) This is just a rewording of Ampere's
erratum description.

https://amperecomputing.com/customer-connect/products/AmpereOne-device-documentation

> Other than the passing comments, I'm OK with this patch. However, I am
> very worried that this is only the start of a very long game of
> whack-a-mole, because there is no actual documentation on what goes
> wrong.
>
> For example, we have plenty of writes to SCTLR_EL2 (using the
> SCTLR_EL1 alias if running VHE) for MTE. Are any of those affected?
>
> Short of having some solid handle on what is happening, I don't see
> how we can promise to support this system.

Completely agree. At least on the AmpereOne machines I have access to
this seems to do the trick, but that observation is no replacement for
full documentation.

--
Thanks,
Oliver