Re: [PATCH] arm64: Add KRYO4XX gold CPU core to spectre-v2 safe list

From: Sai Prakash Ranjan
Date: Fri Aug 14 2020 - 00:38:19 EST

On 2020-08-13 23:29, Marc Zyngier wrote:
On 2020-08-13 13:33, Sai Prakash Ranjan wrote:
On 2020-08-13 16:09, Marc Zyngier wrote:
On 2020-08-13 10:40, Will Deacon wrote:
On Thu, Aug 13, 2020 at 02:49:37PM +0530, Sai Prakash Ranjan wrote:
On 2020-08-13 14:33, Will Deacon wrote:
> On Thu, Aug 13, 2020 at 01:48:34PM +0530, Sai Prakash Ranjan wrote:
> > KRYO4XX gold/big CPU cores are based on Cortex-A76 which has CSV2
> > bits set and are spectre-v2 safe. But on big.LITTLE systems where
> > they are coupled with other CPU cores such as the KRYO4XX silver
> > based on Cortex-A55 which are spectre-v2 safe but do not have CSV2
> > bits set, the system wide safe value will be set to the lowest value
> > of CSV2 bits as per FTR_LOWER_SAFE defined for CSV2 bits of register
> > ID_AA64PFR0_EL1.
> >
> > This is a problem when booting a guest kernel on gold CPU cores
> > where it will incorrectly report ARM_SMCCC_ARCH_WORKAROUND_1 warning
> > and consider them as vulnerable for Spectre variant 2 due to system
> > wide safe value which is used in kvm emulation code when reading id
> > registers. One wrong way of fixing this is to set the FTR_HIGHER_SAFE
> > for CSV2 bits, so instead add the KRYO4XX gold CPU core to the safe
> > list which will be consulted even when the sanitised read reports
> > that CSV2 bits are not set for KRYO4XX gold cores.
> >
> > Reported-by: Stephen Boyd <swboyd@xxxxxxxxxxxx>
> > Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@xxxxxxxxxxxxxx>
> > ---
> > arch/arm64/kernel/cpu_errata.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/arm64/kernel/cpu_errata.c
> > b/arch/arm64/kernel/cpu_errata.c
> > index 6bd1d3ad037a..6cbdd2d98a2a 100644
> > --- a/arch/arm64/kernel/cpu_errata.c
> > +++ b/arch/arm64/kernel/cpu_errata.c
> > @@ -545,6 +545,7 @@ static const struct midr_range
> > spectre_v2_safe_list[] = {
> We shouldn't be putting CPUs in the safe list when they have CSV2
> reporting
> that they are mitigated in hardware, so I don't think this is the right
> approach.

Ok but the only thing I find wrong in this approach is that it is a
information because CSV2 is already advertising the mitigation, but again
CSV2 check is done first so it doesn't really hurt to add it to the safe
list because we already know that it is safe.

It simply doesn't scale. That's why CSV2 exists in the first place, so we
don't have to modify the kernel everytime a new CPU is invented.

> Sounds more like KVM should advertise CSV2 for the vCPUs if all of the
> physical CPUs without CSV2 set are on the safe list. But then again, KVM
> has always been slightly in denial about big.LITTLE because you can't
> sensibly expose it to a guest if there are detectable differences...

Sorry but I don't see how the guest kernel will see the CSV2 bits set for
gold CPU cores without actually adding them to the safe list or reading the
not sanitised value of ID_AA64PFR0_EL1 ?

Well that's for somebody to figure out in the patch. I'm just saying that
adding cores to the safe list when they already have a CSV2 field conveying
the same information is the wrong approach. The right appproach is for KVM
to expose CSV2 as set when the system is not affected by the erratum.

A sensible way to fix this would be with something like that:

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 077293b5115f..2735db21ff0d 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1131,6 +1131,9 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
if (!vcpu_has_sve(vcpu))
val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT);
val &= ~(0xfUL << ID_AA64PFR0_AMU_SHIFT);
+ if (!(val & (0xfUL << ID_AA64PFR0_CSV2_SHIFT)) &&
+ get_spectre_v2_workaround_state() == ARM64_BP_HARDEN_NOT_REQUIRED)
+ val |= (1UL << ID_AA64PFR0_CSV2_SHIFT);
} else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) {
val &= ~((0xfUL << ID_AA64ISAR1_APA_SHIFT) |

Thanks Marc, I gave this a go on SC7180 where the issue was seen and
this fix is good.

Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@xxxxxxxxxxxxxx>

There is still a problem with this approach. A late CPU could
come up after a guest has been started. If that CPU identified
as vulnerable by the host kernel, get_spectre_v2_workaround_state()
will return a different value, breaking the guest (or more
likely, leaving it exposed to Spectre-v2 attacks).

We'd need to disable the late onlining of CPUs that would change
the mitigation state, and this is... ugly.

Ugh, yes indeed and here I was thinking that these things are straightforward :(


QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation