Re: [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex
From: Waiman Long
Date: Tue Aug 23 2016 - 15:36:33 EST
On 08/23/2016 12:57 PM, Peter Zijlstra wrote:
On Tue, Aug 23, 2016 at 09:35:03AM -0700, Jason Low wrote:
On Tue, 2016-08-23 at 09:17 -0700, Davidlohr Bueso wrote:
What's the motivation here? Is it just to unify counter and owner for
the starvation issue? If so, is this really the path we wanna take for
a small debug corner case?
And we thought our other patch was a bit invasive :-)
So I've wanted to do something like this for a while now, and Linus
saying he wanted to always enable the spinning and basically reduce
special cases made me bite the bullet and just do it to see what it
would look like.
So it not only unifies counter and owner for the starvation case, it
does so to allow spinning and debug as well as lock handoff.
It collapses the whole count+owner+yield_to_owner into a single
variable.
It obviously is a tad invasive, but it does make things more similar to
rt-mutex and pi futex, both of which track the owner and pending in the
primary 'word'.
That said, I don't particularly like the new mutex_unlock() code, its
rather more heavy than I would like, although typically the word is
uncontended at unlock and we'd only need a single go at the
cmpxchg-loop.
I think this is the right way to go. There isn't any big change in the
slowpath, so the contended performance should be the same. The fastpath,
however, will get a bit slower as a single atomic op plus a jump
instruction (a single cacheline load) is replaced by a read-and-test and
compxchg (potentially 2 cacheline loads) which will be somewhat slower
than the optimized assembly code. Alternatively, you can replace the
__mutex_trylock() in mutex_lock() by just a blind cmpxchg to optimize
the fastpath further. A cmpxhcg will still be a tiny bit slower than
other atomic ops, but it will be more acceptable, I think.
BTW, I got the following compilation warning when I tried your patch:
drivers/gpu/drm/i915/i915_gem_shrinker.c: In function ‘mutex_is_locked_by’:
drivers/gpu/drm/i915/i915_gem_shrinker.c:44:22: error: invalid operands
to binary == (have ‘atomic_long_t’ and ‘struct task_struct *’)
return mutex->owner == task;
^
CC [M] drivers/gpu/drm/i915/intel_psr.o
drivers/gpu/drm/i915/i915_gem_shrinker.c:49:1: warning: control reaches
end of non-void function [-Wreturn-type]
}
^
make[4]: *** [drivers/gpu/drm/i915/i915_gem_shrinker.o] Error 1
Apparently, you may need to look to see if there are other direct access
of the owner field in the other code.
Cheers,
Longman