Re: [RFC] Disable lockref on arm64

From: Will Deacon
Date: Wed Jun 12 2019 - 05:36:38 EST


Hi JC,

On Wed, Jun 12, 2019 at 04:10:20AM +0000, Jayachandran Chandrasekharan Nair wrote:
> On Wed, May 22, 2019 at 05:04:17PM +0100, Will Deacon wrote:
> > On Sat, May 18, 2019 at 12:00:34PM +0200, Ard Biesheuvel wrote:
> > > On Sat, 18 May 2019 at 06:25, Jayachandran Chandrasekharan Nair
> > > <jnair@xxxxxxxxxxx> wrote:
> > > > Looking thru the perf output of this case (open/close of a file from
> > > > multiple CPUs), I see that refcount is a significant factor in most
> > > > kernel configurations - and that too uses cmpxchg (without yield).
> > > > x86 has an optimized inline version of refcount that helps
> > > > significantly. Do you think this is worth looking at for arm64?
> > > >
> > >
> > > I looked into this a while ago [0], but at the time, we decided to
> > > stick with the generic implementation until we encountered a use case
> > > that benefits from it. Worth a try, I suppose ...
> > >
> > > [0] https://lore.kernel.org/linux-arm-kernel/20170903101622.12093-1-ard.biesheuvel@xxxxxxxxxx/
> >
> > If JC can show that we benefit from this, it would be interesting to see if
> > we can implement the refcount-full saturating arithmetic using the
> > LDMIN/LDMAX instructions instead of the current cmpxchg() loops.
>
> Now that the lockref change is mainline, I think we need to take another
> look at this patch.

Before we get too involved with this, I really don't want to start a trend of
"let's try to rewrite all code using cmpxchg() in Linux because of TX2". At
some point, the hardware needs to play ball. However...

Ard's refcount patch was about moving the overflow check out-of-line. A
side-effect of this, is that we avoid the cmpxchg() operation from many of
the operations (atomic_add_unless() disappears), and it's /this/ which helps
you. So there may well be a middle ground where we avoid the complexity of
the out-of-line {over,under}flow handling but do the saturation post-atomic
inline.

I was hoping we could use LDMIN/LDMAX to maintain the semantics of
REFCOUNT_FULL, but now that I think about it I can't see how we could keep
the arithmetic atomic in that case. Hmm.

Whatever we do, I prefer to keep REFCOUNT_FULL the default option for arm64,
so if we can't keep the semantics when we remove the cmpxchg, you'll need to
opt into this at config time.

Will