Re: [RFC] Disable lockref on arm64

From: Kees Cook
Date: Tue Jun 18 2019 - 03:37:41 EST


On Mon, Jun 17, 2019 at 06:26:20PM +0100, Will Deacon wrote:
> On Mon, Jun 17, 2019 at 01:33:19PM +0200, Ard Biesheuvel wrote:
> > On my single core TX2, the comparative performance is as follows
> >
> > Baseline: REFCOUNT_TIMING test using REFCOUNT_FULL (LSE cmpxchg)
> > 191057942484 cycles # 2.207 GHz
> > 148447589402 instructions # 0.78 insn per
> > cycle
> >
> > 86.568269904 seconds time elapsed
> >
> > Upper bound: ATOMIC_TIMING
> > 116252672661 cycles # 2.207 GHz
> > 28089216452 instructions # 0.24 insn per
> > cycle
> >
> > 52.689793525 seconds time elapsed
> >
> > REFCOUNT_TIMING test using LSE atomics
> > 127060259162 cycles # 2.207 GHz
>
> Ok, so assuming JC's complaint is valid, then these numbers are compelling.
> In particular, my understanding of this thread is that your optimised
> implementation doesn't actually sacrifice any precision; it just changes
> the saturation behaviour in a way that has no material impact. Kees, is that
> right?

That is my understanding, yes. There is no loss to detection precision.
But for clarity, I should point out it has one behavioral change that is
the same change as on x86: the counter is now effectively a 31 bit counter
not a 32 bit counter, as the signed bit is being used for saturation.

> If so, I'm not against having this for arm64, with the premise that we can
> hide the REFCOUNT_FULL option entirely given that it would only serve to
> confuse if exposed.

If the LSE atomics version has overflow, dec-to-zero, and inc-from-zero
protections, then as far as I'm concerned, REFCOUNT_FULL doesn't need
to exist for arm64. On the Kconfig front, as long as there isn't a way
to revert refcount_t to atomic_t, I'm happy. :)

--
Kees Cook