Re: [RFC] Disable lockref on arm64

From: Ard Biesheuvel
Date: Mon Jun 17 2019 - 07:38:25 EST


On Sun, 16 Jun 2019 at 23:31, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>
> On Sat, Jun 15, 2019 at 04:18:21PM +0200, Ard Biesheuvel wrote:
> > Yes, I am using the same saturation point as x86. In this example, I
> > am not entirely sure I understand why it matters, though: the atomics
> > guarantee that the write by CPU2 fails if CPU1 changed the value in
> > the mean time, regardless of which value it wrote.
> >
> > I think the concern is more related to the likelihood of another CPU
> > doing something nasty between the moment that the refcount overflows
> > and the moment that the handler pins it at INT_MIN/2, e.g.,
> >
> > > CPU 1 CPU 2
> > > inc()
> > > load INT_MAX
> > > about to overflow?
> > > yes
> > >
> > > set to 0
> > > <insert exploit here>
> > > set to INT_MIN/2
>
> Ah, gotcha, but the "set to 0" is really "set to INT_MAX+1" (not zero)
> if you're using the same saturation.
>

Of course. So there is no issue here: whatever manipulations are
racing with the overflow handler can never result in the counter to
unsaturate.

And actually, moving the checks before the stores is not as trivial as
I thought, E.g., for the LSE refcount_add case, we have

" ldadd %w[i], w30, %[cval]\n" \
" adds %w[i], %w[i], w30\n" \
REFCOUNT_PRE_CHECK_ ## pre (w30)) \
REFCOUNT_POST_CHECK_ ## post \

and changing this into load/test/store defeats the purpose of using
the LSE atomics in the first place.

On my single core TX2, the comparative performance is as follows

Baseline: REFCOUNT_TIMING test using REFCOUNT_FULL (LSE cmpxchg)
191057942484 cycles # 2.207 GHz
148447589402 instructions # 0.78 insn per
cycle

86.568269904 seconds time elapsed

Upper bound: ATOMIC_TIMING
116252672661 cycles # 2.207 GHz
28089216452 instructions # 0.24 insn per
cycle

52.689793525 seconds time elapsed

REFCOUNT_TIMING test using LSE atomics
127060259162 cycles # 2.207 GHz
0 instructions # 0.00 insn per
cycle

57.243690077 seconds time elapsed