Re: [PATCH v2 5/5] arm64: entry: Enable random_kstack_offset support
From: Kees Cook
Date: Thu Mar 26 2020 - 12:31:36 EST
On Thu, Mar 26, 2020 at 11:15:21AM +0000, Mark Rutland wrote:
> On Wed, Mar 25, 2020 at 01:22:07PM -0700, Kees Cook wrote:
> > On Wed, Mar 25, 2020 at 01:21:27PM +0000, Mark Rutland wrote:
> > > On Tue, Mar 24, 2020 at 01:32:31PM -0700, Kees Cook wrote:
> > > > Allow for a randomized stack offset on a per-syscall basis, with roughly
> > > > 5 bits of entropy.
> > > >
> > > > Signed-off-by: Kees Cook <keescook@xxxxxxxxxxxx>
> > >
> > > Just to check, do you have an idea of the impact on arm64? Patch 3 had
> > > figures for x86 where it reads the TSC, and it's unclear to me how
> > > get_random_int() compares to that.
> >
> > I didn't do a measurement on arm64 since I don't have a good bare-metal
> > test environment. I know Andy Lutomirki has plans for making
> > get_random_get() as fast as possible, so that's why I used it here.
>
> Ok. I suspect I also won't get the chance to test that in the next few
> days, but if I do I'll try to share the results.
Okay, thanks! I can try a rough estimate under emulation, but I assume
that'll be mostly useless. :)
> My concern here was that, get_random_int() has to grab a spinlock and
> mess with IRQ masking, so has the potential to block for much longer,
> but that might not be an issue in practice, and I don't think that
> should block these patches.
Gotcha. I was already surprised by how "heavy" the per-cpu access was
when I looked at the resulting assembly (there looked to be preempt
stuff, etc). But my hope was that this is configurable so people can
measure for themselves if they want it, and most people who want this
feature have a high tolerance for performance trade-offs. ;)
> > I couldn't figure out if there was a comparable instruction like rdtsc
> > in aarch64 (it seems there's a cycle counter, but I found nothing in
> > the kernel that seemed to actually use it)?
>
> AArch64 doesn't have a direct equivalent. The generic counter
> (CNTxCT_EL0) is the closest thing, but its nominal frequency is
> typically much lower than the nominal CPU clock frequency (unlike TSC
> where they're the same). The cycle counter (PMCCNTR_EL0) is part of the
> PMU, and can't be relied on in the same way (e.g. as perf reprograms it
> to generate overflow events, and it can stop for things like WFI/WFE).
Okay, cool; thanks for the details! It's always nice to confirm I didn't
miss some glaringly obvious solution. ;)
For a potential v2, should I add your reviewed-by or wait for your
timing analysis, etc?
--
Kees Cook