Re: [RFC PATCH] arm64: Enable data independent timing (DIT) in the kernel

From: Mark Kettenis
Date: Thu Oct 27 2022 - 10:45:49 EST


> From: Ard Biesheuvel <ardb@xxxxxxxxxx>
> Date: Thu, 27 Oct 2022 15:17:56 +0200
>
> On Thu, 27 Oct 2022 at 14:12, Mark Rutland <mark.rutland@xxxxxxx> wrote:
> >
> > On Thu, Oct 27, 2022 at 01:27:41PM +0200, Ard Biesheuvel wrote:
> > > The ARM architecture revision v8.4 introduces a data independent timing
> > > control (DIT) which can be set at any exception level, and instructs the
> > > CPU to avoid optimizations that may result in a correlation between the
> > > execution time of certain instructions and the value of the data they
> > > operate on.
> > >
> > > The DIT bit is part of PSTATE, and is therefore context switched as
> > > usual, given that it becomes part of the saved program state (SPSR) when
> > > taking an exception. We have also defined a hwcap for DIT, and so user
> > > space can discover already whether or nor DIT is available. This means
> > > that, as far as user space is concerned, DIT is wired up and fully
> > > functional.
> > >
> > > In the kernel, however, we never bothered with DIT: we disable at it
> > > boot (i.e., INIT_PSTATE_EL1 has DIT cleared) and ignore the fact that we
> > > might run with DIT enabled if user space happened to set it.
> >
> > FWIW, I did raise this with some architects a while back, and the thinking at
> > the time was that implementations were likely to have data independent timing
> > regardless of the value of the DIT bit, and so manipulating this at exception
> > entry was busy work with no actual gain (especially if we had to handle
> > mismatched big.LITTLE systems).
> >
> > Since then, I have become aware of some possible implementation choices which
> > would consider the bit (though I have no idea if anyone is building such
> > implementations).
> >
>
> It's a bit unfortunate that FEAT_DIT is mandatory even if the uarch in
> question behaves the same regardless. And the fact that it is
> documented as resetting to 0x0, and already being exposed to user
> space doesn't help either. But that doesn't justify keeping this
> disabled in the kernel.
>
> The architecture does not permit us to distinguish between the cases
> where this is just busywork and where it does make a difference. So we
> have to assume it is the latter.
>
> > > Given that running privileged code with DIT disabled on a CPU that
> > > implements support for it may result in a side channel that exposes
> > > privileged data to unprivileged user space processes,
> >
> > I appreciate this is a simple way to rule out issues of that sort, but I think
> > the "may" in that sentence is doing a lot of work, since:
> >
> > * IIUC, we don't have a specific case in mind that we're concerned about. I can
> > believe that we think all the crypto code we intend to be constant time is
> > theoretically affected.
> >
>
> I think this reasoning is backwards. I don't think it is necessary to
> go and identify where we are relying on timing invariance. Crypto
> keeps coming up, and of course, it is a well known example of where
> timing variances may leak the plaintext or the key. But crypto is just
> one way to keep data confidential, and another method we rely on
> heavily is the privilege boundary between kernel and user space. So
> just like all crypto should be constant time, all privileged execution
> should [ideally] be constant time as well.
>
> > * IIUC we haven't gone an audited all constant-time code to check it doesn't
> > happen to use instructions with data-dependent-timing. So there might be more
> > work to do atop this to ensure theoretical correctness.
> >
>
> Sure.
>
> > * AFAIK there are no contemporary implementations where the DIT is both
> > implemented and it being clear results in data-dependent-timing. i.e. we have
> > nothing to actually test on.
> >
>
> Then why on earth are such implementations required to implement FEAT_DIT??
>
> > Given that, it would be nice if we could make this a bit clearer, e.g. state
> > that this is currently a belt-and-braces approach for theoretical cases, rather
> > than an extant side-channel today (as far as we're aware).
> >
>
> Sure - I'll add some extra prose to the commit log to capture the above.
>
> > > let's enable DIT while running in the kernel if supported by all CPUs.
> >
> > FWIW, I think it's reasonable to do this when all CPUs have DIT.
> >
> > I have a slight fear that (as above) if there are future CPUs which do consider
> > DIT, there's presumably a noticeable performance difference (or the CPU would
> > just provide data-independent-timing regardless), but I'm not sure if that's
> > just something we have to live with or could punt on until we notice such
> > cases.
> >
>
> Sure. Another concern might be the overhead of toggling the bit, so
> getting this change in sooner than later might actually help direct
> the development towards implementations where the performance uplift
> substantially outweighs the cycles spent on managing the DIT state.
>
> To me, it seems unlikely that timing dependent optimizations in
> privileged code would benefit actual real-world workloads while not
> resulting in exploitable timing leajs, but user space should be able
> to manage this however it wants.
>
> IOW, yes. Let's pick a safe default, and when use cases turn up where
> disabling DIT makes a substantial difference, we can revisit this.

FWIW, I came to the same conclusion and that's what I implemented for
OpenBSD.

> > > Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
> > > Cc: Will Deacon <will@xxxxxxxxxx>
> > > Cc: Mark Rutland <mark.rutland@xxxxxxx>
> > > Cc: Marc Zyngier <maz@xxxxxxxxxx>
> > > Cc: Eric Biggers <ebiggers@xxxxxxxxxx>
> > > Cc: Jason A. Donenfeld <Jason@xxxxxxxxx>
> > > Cc: Kees Cook <keescook@xxxxxxxxxxxx>
> > > Cc: Suzuki K Poulose <suzuki.poulose@xxxxxxx>
> > > Cc: Adam Langley <agl@xxxxxxxxxx>
> > > Link: https://lore.kernel.org/all/YwgCrqutxmX0W72r@xxxxxxxxx/
> > > Signed-off-by: Ard Biesheuvel <ardb@xxxxxxxxxx>
> > > ---
> > > arch/arm64/include/asm/sysreg.h | 3 +++
> > > arch/arm64/kernel/cpufeature.c | 16 ++++++++++++++++
> > > arch/arm64/kernel/entry.S | 3 +++
> > > arch/arm64/tools/cpucaps | 1 +
> > > 4 files changed, 23 insertions(+)
> >
> > Don't we also need to touch __cpu_suspend_exit() in arch/am64/kernel/suspend.c?
> > I'm assuming so given that has __uaccess_enable_hw_pan() today.
> >
>
> Indeed, I missed that.
>
> > Presumably we'd also care about this in KVM hyp code?
> >
>
> Presumably, yes but I didn't investigate thoroughly what that would
> entail. I think this can be managed separately, and I'll look into it
> once the discussion here converges.
>
> > > diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> > > index 7d301700d1a9..18e065f5130c 100644
> > > --- a/arch/arm64/include/asm/sysreg.h
> > > +++ b/arch/arm64/include/asm/sysreg.h
> > > @@ -94,15 +94,18 @@
> > > #define PSTATE_PAN pstate_field(0, 4)
> > > #define PSTATE_UAO pstate_field(0, 3)
> > > #define PSTATE_SSBS pstate_field(3, 1)
> > > +#define PSTATE_DIT pstate_field(3, 2)
> > > #define PSTATE_TCO pstate_field(3, 4)
> > >
> > > #define SET_PSTATE_PAN(x) __emit_inst(0xd500401f | PSTATE_PAN | ((!!x) << PSTATE_Imm_shift))
> > > #define SET_PSTATE_UAO(x) __emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << PSTATE_Imm_shift))
> > > #define SET_PSTATE_SSBS(x) __emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) << PSTATE_Imm_shift))
> > > +#define SET_PSTATE_DIT(x) __emit_inst(0xd500401f | PSTATE_DIT | ((!!x) << PSTATE_Imm_shift))
> > > #define SET_PSTATE_TCO(x) __emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << PSTATE_Imm_shift))
> > >
> > > #define set_pstate_pan(x) asm volatile(SET_PSTATE_PAN(x))
> > > #define set_pstate_uao(x) asm volatile(SET_PSTATE_UAO(x))
> > > +#define set_pstate_dit(x) asm volatile(SET_PSTATE_DIT(x))
> > > #define set_pstate_ssbs(x) asm volatile(SET_PSTATE_SSBS(x))
> > >
> > > #define __SYS_BARRIER_INSN(CRm, op2, Rt) \
> > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > > index 6062454a9067..273a74df24fe 100644
> > > --- a/arch/arm64/kernel/cpufeature.c
> > > +++ b/arch/arm64/kernel/cpufeature.c
> > > @@ -2077,6 +2077,11 @@ static void cpu_trap_el0_impdef(const struct arm64_cpu_capabilities *__unused)
> > > sysreg_clear_set(sctlr_el1, 0, SCTLR_EL1_TIDCP);
> > > }
> > >
> > > +static void cpu_enable_dit(const struct arm64_cpu_capabilities *__unused)
> > > +{
> > > + set_pstate_dit(1);
> > > +}
> >
> > I think this wants the same treatment as PAN, i.e.
> > WARN_ON_ONCE(in_interrupt()); with a comment about PSTATE being discareded upon
> > ERET.
> >
>
> This is only called from the CPU feature code, which calls it under
> stop_machine(), so I decided it was not needed. I don't mind adding
> it, though, just seems unnecessary to me.
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>