Re: [GIT PULL] locking/urgent for v5.12

From: Peter Zijlstra
Date: Mon Apr 26 2021 - 04:05:14 EST


On Sun, Apr 25, 2021 at 01:06:52PM -0400, Waiman Long wrote:
> On 4/25/21 12:39 PM, Linus Torvalds wrote:

> > > I'm assuming it's because of the switch to try_cmpxchg by PeterZ?
>
> Yes, try_cmpxchg() requires a variable to hold the new value as well as a
> place to return the actual value before the cmpxchg(). It is just the way
> try_cmpxchg() works.

Right; by virtue of it returning a boolean, the value return needs to be
through a pointer argument.


> > > New confusion:
> > > - Why is the truly non-critical cmpxchg using "try_cmpxhg()", when
> > > the _first_ cmpxchg - above the loop - is not?

> At least for x86, try_cmpxchg() seems to produce a slight better assembly
> code than the regular cmpxchg(). I guess that may be one of the reason Peter
> changed it to use try_cmpxchg(). Another reason that I can think of is to
> make the code fit in one line instead of splitting it up into two lines like
> the original version from Ali.

Right, x86 generates slightly better asm (and potentially so for any
architecture that has CAS state in condition codes) while it's a wash
for other architectures (specifically we checked at the time arm64
didn't generate worse code).

> > >
> > > Pre-existing confusion:
> > > - Why is the code using "atomic_add()" to set a bit?
> > >
> > > Yeah, yeah, neither of these are *bugs*, but Christ is that code hard
> > > to read. The "use add to set a bit" is valid because of the spinlock
> > > serialization (ie only one add can ever happen), and the
> > > cmpxchg-vs-try_cmpxchg confusion isn't buggy, it's just really really
> > > confusing that that same function is using two different - but
> > > equivalent - cmpxchg things on the same variable literally a couple of
> > > lines apart.
> As you have said, the spinlock serialization makes sure that only 1 writer
> is allowed to do that. I agree that using atomic_or() looks better in this
> case. Both of them are equivalent in this particular case.

Agreed, I think the reason is that because of the read-side doing the
BIAS add/sub, some of that snuck into the write side. AFAIK no arch
lacks the atomic_or() intrinsic. The one that's often an issue is
atomic_fetch_or() (x86 for one doesn't have it :/).

> > > I've pulled this, but can we please
> > >
> > > - make *both* of the cmpxchg's use "try_cmpxchg()" (and thus that
> > > "cnts" variable)?
> Yes, we can certainly change the other cmpxchg() to try_cmpxchg().
> > >
> > > - add a comment about _why_ it's doing "atomic_add()" instead of the
> > > much more logical "atomic_or()", and about how the spinlock serializes
> > > it
> > >
> > > I'm assuming the "atomic_add()" is simply because many more
> > > architectures have that as an actual intrinsic atomic. I understand.
> > > But it's really really not obvious from the code.
> > >
> I will post a patch to make the suggested change to qrwlock.c.

Thanks.