Re: [RFC] Disable lockref on arm64

From: Will Deacon
Date: Fri Jun 14 2019 - 06:03:38 EST


[+Kees]

On Fri, Jun 14, 2019 at 07:09:26AM +0000, Jayachandran Chandrasekharan Nair wrote:
> On Wed, Jun 12, 2019 at 10:31:53AM +0100, Will Deacon wrote:
> > On Wed, Jun 12, 2019 at 04:10:20AM +0000, Jayachandran Chandrasekharan Nair wrote:
> > > Now that the lockref change is mainline, I think we need to take another
> > > look at this patch.
> >
> > Before we get too involved with this, I really don't want to start a trend of
> > "let's try to rewrite all code using cmpxchg() in Linux because of TX2".
>
> x86 added a arch-specific fast refcount implementation - and the commit
> specifically notes that it is faster than cmpxchg based code[1].
>
> There seems to be an ongoing effort to move over more and more subsystems
> from atomic_t to refcount_t(e.g.[2]), specifically because refcount_t on
> x86 is fast enough and you get some error checking atomic_t that does not
> have.

Correct, but there are also some cases that are only caught by
REFCOUNT_FULL.

> > At some point, the hardware needs to play ball. However...
>
> Even on a totally baller CPU, REFCOUNT_FULL is going to be slow :)
> On TX2, this specific benchmark just highlights the issue, but the
> difference is significant even on x86 (as noted above).

My point was more general than that. If you want scalable concurrent code,
then you end up having to move away from the serialisation introduced by
locking. The main trick in the toolbox is cmpxchg() so, in the absence of
a zoo of special-purpose atomic instructions, it really needs to do better
than serialising.

> > I was hoping we could use LDMIN/LDMAX to maintain the semantics of
> > REFCOUNT_FULL, but now that I think about it I can't see how we could keep
> > the arithmetic atomic in that case. Hmm.
>
> Do you think Ard's patch needs changes before it can be considered? I
> can take a look at that.

I would like to see how it performs if we keep the checking inline, yes.
I suspect Ard could spin this in short order.

> > Whatever we do, I prefer to keep REFCOUNT_FULL the default option for arm64,
> > so if we can't keep the semantics when we remove the cmpxchg, you'll need to
> > opt into this at config time.
>
> Only arm64 and arm selects REFCOUNT_FULL in the default config. So please
> reconsider this! This is going to slow down arm64 vs. other archs and it
> will become worse when more code adopts refcount_t.

Maybe, but faced with the choice between your micro-benchmark results and
security-by-default for people using the arm64 Linux kernel, I really think
that's a no-brainer. I'm well aware that not everybody agrees with me on
that.

Will