Re: [PATCH 1/2] arm64: ARM_SMCCC_ARCH_WORKAROUND_1 doesn't return SMCCC_RET_NOT_REQUIRED

From: Stephen Boyd
Date: Wed Oct 21 2020 - 12:12:16 EST


Quoting Will Deacon (2020-10-21 08:49:09)
> On Wed, Oct 21, 2020 at 08:23:54AM -0700, Stephen Boyd wrote:
> >
> > If I'm reading the TF-A code correctly it looks like this will return
> > SMC_UNK if the platform decides that "This flag can be set to 0 by the
> > platform if none of the PEs in the system need the workaround." Where
> > the flag is WORKAROUND_CVE_2017_5715 and the call handler returns 1 if
> > the errata doesn't apply but the config is enabled, 0 if the errata
> > applies and the config is enabled, or SMC_UNK (I guess this is
> > NOT_SUPPORTED?) if the config is disabled[2].
> >
> > So TF-A could disable this config and then the kernel would think it is
> > vulnerable when it actually isn't? The spec is a pile of ectoplasma
> > here.
>
> Yes, but there's not a lot we can do in that case as we rely on the
> firmware to tell us whether or not we're affected. We do have the
> "safelist" as a last resort, but that's about it.

There are quite a few platforms that set this config to 0. Should they
be setting it to 1?

tf-a $ git grep WORKAROUND_CVE_2017_5715 -- **/platform.mk | wc -l
17

This looks like a disconnect between kernel and TF-A but I'm not aware
of all the details here.

>
> >
> > Does the kernel implement a workaround in the case that no guest PE is
> > affected? If so then returning 1 sounds OK to me, but otherwise
> > NOT_SUPPORTED should work per the spec.
>
> I don't follow you here. The spec says that "SMCCC_RET_NOT_SUPPORTED" is
> valid return code in the case that "The system contains at least 1 PE
> affected by CVE-2017-5715 that has no firmware mitigation available."
> and do the guest would end up in the "vulnerable" state.
>

Returning 1 says "SMCCC_ARCH_WORKAROUND_1 can be invoked safely on all
PEs in the system" so I am not sure that invoking it is from a guest is
safe on systems that don't require the workaround? If it is always safe
to invoke the call from guest to host then returning 1 should be fine
here.

My read of the spec was that the intent is to remove the call at some
point and have the removal of the call mean that it isn't vulnerable.
Because NOT_SUPPORTED per the spec means "not needed", "maybe needed",
or "firmware doesn't know". Aha maybe they wanted us to make the call on
each CPU (i.e. PE) and then if any of them return 0 we should consider
it vulnerable and if they return NOT_SUPPORTED we should keep calling
for each CPU until we are sure we don't see a 0 and only see a 1 or
NOT_SUPPORTED? Looks like a saturating value sort of thing, across CPUs
that we care/know about.