Re: Should Linux set the new constant-time mode CPU flags?

From: Catalin Marinas
Date: Thu Sep 15 2022 - 13:19:06 EST


(catching up with this thread)

On Fri, Aug 26, 2022 at 10:45:07AM +0200, Arnd Bergmann wrote:
> On Fri, Aug 26, 2022 at 1:15 AM Eric Biggers <ebiggers@xxxxxxxxxx> wrote:
> > For arm64, it's not clear to me whether the DIT flag is privileged or not. If
> > privileged, I expect it would need to be set by the kernel just like the Intel
> > flag. If unprivileged, I expect there will still be work to do in the kernel,
> > as the flag will need to be set when running any crypto code in the kernel.
>
> 7206dc93a58f ("arm64: Expose Arm v8.4 features") added the feature bit for
> Armv8.4+ processors. From what I can tell from the documentation and the
> kernel source, I see:
>
> - if the feature is set in HWCAP (or /proc/cpuinfo), then the instruction DIT
> register is available in user space, and sensitive code can set or clear the
> constant-time mode for the local thread.

Indeed, the arm64 DIT feature can be enabled in user space, subject to
checking the HWCAP bit or the CPUID regs (via kernel trapping and
emulation). The expectation was that some crypto routines would set it
on function entry, restore it on return but...

> - On CPUs without the feature (almost all ARMv8 ones), the register should
> not betouched.

That's one of the drawbacks of using the features in user-space (the
instruction is not in the hint/nop space). It can be worked around with
ifunc resolvers but with a slight overhead on function calling.

> - The bit is context switched on kernel entry, so setting the bit in user space
> does not change the behavior inside of a syscall
> - If we add a user space interface for setting the bit per thread on x86,
> the same interface could be supported to set the bit on arm64 to save
> user space implementations the trouble of checking the feature bits

A prctl() would do here but I think the default should be off or at
least allow a sysctl to control this. Enabling DIT could have a small
performance impact while lots of (most?) apps don't need such
guarantees.

For arm64, my preference is to have this option per-thread and even be
able to toggle it within a thread (not sure that's possible on x86
without a syscall).

Other random ideas of deploying this (for arm64): have an ELF annotation
that data independent timing is required. If that's on the main
executable, the kernel could turn it on for the app. If it's on a
(crypto) library, it's up to the dynamic loader to either turn it on for
the whole app or just use some function veneers to save/restore it when
the library code is executed.

I assume having this per-thread would work on x86 as well but I'm not
sure about the context switching cost.

> - the in-kernel crypto code does not set the bit today but could be easily
> changed to do this for CPUs that support it, if we can decide on a policy
> for when to enable or disable it.

In the kernel it's easier, at least for arm64, to enable it for specific
functions (we can do boot-time code patching).

Whichever way we support it, I'd rather not turn it on by default.
Talking to some of the Arm microarchitects, such feature may prevent
certain hardware optimisations.

--
Catalin