Re: [PATCH v3 4/5] arm64: Add support for SMCCC TRNG entropy source

From: André Przywara
Date: Fri Nov 20 2020 - 05:53:01 EST


On 19/11/2020 13:41, Ard Biesheuvel wrote:

Hi,

> On Fri, 13 Nov 2020 at 19:24, Andre Przywara <andre.przywara@xxxxxxx> wrote:
>>
>> The ARM architected TRNG firmware interface, described in ARM spec
>> DEN0098, defines an ARM SMCCC based interface to a true random number
>> generator, provided by firmware.
>> This can be discovered via the SMCCC >=v1.1 interface, and provides
>> up to 192 bits of entropy per call.
>>
>> Hook this SMC call into arm64's arch_get_random_*() implementation,
>> coming to the rescue when the CPU does not implement the ARM v8.5 RNG
>> system registers.
>>
>> For the detection, we piggy back on the PSCI/SMCCC discovery (which gives
>> us the conduit to use (hvc/smc)), then try to call the
>> ARM_SMCCC_TRNG_VERSION function, which returns -1 if this interface is
>> not implemented.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@xxxxxxx>
>> ---
>> arch/arm64/include/asm/archrandom.h | 69 ++++++++++++++++++++++++-----
>> 1 file changed, 58 insertions(+), 11 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/archrandom.h b/arch/arm64/include/asm/archrandom.h
>> index abe07c21da8e..fe34bfd30caa 100644
>> --- a/arch/arm64/include/asm/archrandom.h
>> +++ b/arch/arm64/include/asm/archrandom.h
>> @@ -4,13 +4,24 @@
>>
>> #ifdef CONFIG_ARCH_RANDOM
>>
>> +#include <linux/arm-smccc.h>
>> #include <linux/bug.h>
>> #include <linux/kernel.h>
>> #include <asm/cpufeature.h>
>>
>> +#define ARM_SMCCC_TRNG_MIN_VERSION 0x10000UL
>> +
>> +extern bool smccc_trng_available;
>> +
>> static inline bool __init smccc_probe_trng(void)
>> {
>> - return false;
>> + struct arm_smccc_res res;
>> +
>> + arm_smccc_1_1_invoke(ARM_SMCCC_TRNG_VERSION, &res);
>> + if ((s32)res.a0 < 0)
>> + return false;
>> +
>> + return res.a0 >= ARM_SMCCC_TRNG_MIN_VERSION;
>> }
>>
>> static inline bool __arm64_rndr(unsigned long *v)
>> @@ -43,26 +54,52 @@ static inline bool __must_check arch_get_random_int(unsigned int *v)
>>
>> static inline bool __must_check arch_get_random_seed_long(unsigned long *v)
>> {
>> + struct arm_smccc_res res;
>> +
>> /*
>> * Only support the generic interface after we have detected
>> * the system wide capability, avoiding complexity with the
>> * cpufeature code and with potential scheduling between CPUs
>> * with and without the feature.
>> */
>> - if (!cpus_have_const_cap(ARM64_HAS_RNG))
>> - return false;
>> + if (cpus_have_const_cap(ARM64_HAS_RNG))
>> + return __arm64_rndr(v);
>>
>> - return __arm64_rndr(v);
>> -}
>> + if (smccc_trng_available) {
>> + arm_smccc_1_1_invoke(ARM_SMCCC_TRNG_RND64, 64, &res);
>> + if ((int)res.a0 < 0)
>> + return false;
>>
>> + *v = res.a3;
>> + return true;
>> + }
>> +
>> + return false;
>> +}
>>
>
> I think we should be more rigorous here in how we map the concepts of
> random seeds and random numbers onto the various sources.
>
> First of all, assuming my patch dropping the call to
> arch_get_random_seed_long() from add_interrupt_randomness() gets
> accepted, we should switch to RNDRRS here, and implement the non-seed
> variants using RNDR.

I agree (and have a patch ready), but that seems independent from this
series.

> However, this is still semantically inaccurate: RNDRRS does not return
> a random *seed*, it returns a number drawn from a freshly seeded
> pseudo-random sequence. This means that the TRNG interface, if
> implemented, is a better choice, and so we should try it first. Note
> that on platforms that don't implement both, only one of these will be
> available in the first place. But on platforms that *do* implement
> both, the firmware interface may actually be less wasteful in terms of
> resources: the TRNG interface returns every bit drawn from the
> underlying entropy source, whereas RNDRRS uses ~500 bits of entropy to
> reseed a DRBG that gets used only once to draw a single 64-bit number.

I am not sure I share your enthusiasm about the quality of the actual
TRNG firmware implementations, but we can go with this for now.
Maybe if we see bad implementations in the future, we can revisit this,
and have some tuneables? Or a command line option to ignore the SMCCC
interface? Or use the UUID mechanism for that?

> And the cost of the SMCCC call in terms of CPU time is charged to the
> caller, which is appropriate here.

This still leaves the problem that the core might be stuck in EL3 for an
unknown period of time, impeding our realtime efforts.
Do we have some ball park number of a number of cycles spent in EL3
still being acceptable? That could serve as a guideline for firmware
implementations?

> Then, I don't think we should ever return false without even trying if
> RNDRRS is available if the SMCCC invocation fails.

That's a good point.

> Something like this perhaps?
>
> if (smccc_trng_available) {
> arm_smccc_1_1_invoke(ARM_SMCCC_TRNG_RND64, 64, &res);
> if ((int)res.a0 >= 0) {
> *v = res.a3;
> return true;
> }
> }

yeah, the fall-through here is a good idea.

>
> if (cpus_have_const_cap(ARM64_HAS_RNG))
> return __arm64_rndrrs(v);
>
> return false;

So I wonder if we have a trade-off here between the performance of the
RNDRRS entropy source and the latency of the firmware implementation.
If the RNDR entropy source delivers Mbits/s (the Juno h/w definitely
does), we might just not care about throwing away some (or even the
majority) of it.

On the other hand the Juno TRNG hardware for instance spends already
hundreds of cycles on a single 32-bit MMIO read alone, just to transfer
the bits into the CPU. Having a rather large pool could avoid paying
this price on every SMC call, but I don't see a nice way of allowing
TF-A to fill this pool, when Linux thinks it can spare the time.

So I am a bit uneasy about unconditionally preferring the SMCCC
implementation over the RNDRRS instruction.

Cheers,
Andre

> (and something similar 2x below)
>
>
>> static inline bool __must_check arch_get_random_seed_int(unsigned int *v)
>> {
>> + struct arm_smccc_res res;
>> unsigned long val;
>> - bool ok = arch_get_random_seed_long(&val);
>>
>> - *v = val;
>> - return ok;
>> + if (cpus_have_const_cap(ARM64_HAS_RNG)) {
>> + if (arch_get_random_seed_long(&val)) {
>> + *v = val;
>> + return true;
>> + }
>> + return false;
>> + }
>> +
>> + if (smccc_trng_available) {
>> + arm_smccc_1_1_invoke(ARM_SMCCC_TRNG_RND64, 32, &res);
>> + if ((int)res.a0 < 0)
>> + return false;
>> +
>> + *v = res.a3 & GENMASK(31, 0);
>> + return true;
>> + }
>> +
>> + return false;
>> }
>>
>> static inline bool __init __early_cpu_has_rndr(void)
>> @@ -77,10 +114,20 @@ arch_get_random_seed_long_early(unsigned long *v)
>> {
>> WARN_ON(system_state != SYSTEM_BOOTING);
>>
>> - if (!__early_cpu_has_rndr())
>> - return false;
>> + if (__early_cpu_has_rndr())
>> + return __arm64_rndr(v);
>> +
>> + if (smccc_trng_available) {
>> + struct arm_smccc_res res;
>>
>> - return __arm64_rndr(v);
>> + arm_smccc_1_1_invoke(ARM_SMCCC_TRNG_RND64, 64, &res);
>> + if ((int)res.a0 >= 0) {
>> + *v = res.a3;
>> + return true;
>> + }
>> + }
>> +
>> + return false;
>> }
>> #define arch_get_random_seed_long_early arch_get_random_seed_long_early
>>
>> --
>> 2.17.1
>>