On Thu, Nov 17, 2022 at 09:20:15PM -0500, Waiman Long wrote:
The lock handoff provided in rwsem isn't a true handoff like that inSpecifically, the issue is that we'd need to turn the
the mutex. Instead, it is more like a quiescent state where optimistic
spinning and lock stealing are disabled to make it easier for the first
waiter to acquire the lock.
Reworking the code to enable a true lock handoff is more complex due to
the following facts:
1) The RWSEM_FLAG_HANDOFF bit is protected by the wait_lock and it
is too expensive to always take the wait_lock in the unlock path
to prevent racing.
atomic_long_add_return_release() into an atomic_try_cmpxchg_release()
loop, like:
tmp = atomic_long_read(&sem->count);
do {
if (tmp & (WAITERS|HANDOFF))
return slow_unock();
} while (atomic_long_try_cmpxchg_release(&sem->count, &tmp,
tmp - RWSEM_{READER_BIAS,WRITE_LOCKED});
in order to not race with a concurrent setting of the HANDOFF bit,
right? And we don't like to turn unlock into a cmpxchg loop.
(OTOH we sorta do this for mutex, unconteded mutex has cmpxchg lock and
unlock, any fail and we go to the slow path -- I suppose the distinct
difference is that we sorta expect some contention on the read side)
2) The reader lock fast path may add a RWSEM_READER_BIAS at the wrongYes, there is no extra overhead unless the waiter bit is set.
time to prevent a proper lock handoff from a reader owned rwsem.
This would be much the same, right? We'd have to turn
rwsem_read_trylock() into a cmpxchg-loop and we don't like that.
Therefore we do that speculative add and fix up later.
Now, I'm not enturely sure what issue you allude to here; is the problem
that you can't quite tell when the last reader is gone?
A lock handoff can only be initiated when the following conditions ared'uh ;-)
true:
1) The RWSEM_FLAG_HANDOFF bit is set.
2) The task to do the handoff don't see any other active lock2) here is the 2) above, right?
excluding the lock that it might have held.
The new handoff mechanism performs handoff in rwsem_wakeup() to minimizeRight, that's between atomic_long_fetch_add_release() and calling the
overhead. The rwsem count will be known at that point to determine if
handoff should be done. However, there is a small time gap between the
rwsem becomes free and the wait_lock is taken
slow path because WAITERS bit is set.
where a reader can come in and add a RWSEM_READER_BIAS to the count orBoth 2s above.
the current first waiter can take the rwsem and clearThe actual intended action.
RWSEM_FLAG_HANDOFF in the interim.
That will fail the handoff operation.I would not list that latter as a failure, it's exactly what we want to
happen, no?
To handle the former case, a secondary handoff will also be done inRight. In short:
the rwsem_down_read_slowpath() to catch it.
Having HANDOVER set:
- implies WAITERS set
- disables all fastpaths (spinning or otherwise)
- dis-allows anybody except first waiter to obtain lock
Therefore, having the window between clearing owner and prodding first
waiter is 'harmless'.
With true lock handoff, there is no need to do a NULL owner spinningRight, removing that NULL spinning was the whole purpose -- except I
anymore as wakeup will be performed if handoff is possible. So it
is likely that the first waiter won't actually go to sleep even when
schedule() is called in this case.
seem to have forgotten why it was a problem :-)
OK, lemme go read the actual patch.
Hmmm... you made it a wee bit more complicated, instead of my 3rd clause
above, you added a whole intermediate GRANTED state. Why?
Since we fundamentally must deal with the release->wait_lock hole, why
do we need to do the whole rwsem_wake()->GRANTED->*_slowpath() dance?
Why can't we skip the whole rwsem_wake()->GRANTED part and only do
handoff in the slowpath?