Re: [PATCH v3 1/5] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath

From: Waiman Long
Date: Mon Oct 24 2022 - 17:17:15 EST


On 10/24/22 09:17, Peter Zijlstra wrote:
On Mon, Oct 17, 2022 at 05:13:52PM -0400, Waiman Long wrote:
A non-first waiter can potentially spin in the for loop of
rwsem_down_write_slowpath() without sleeping but fail to acquire the
lock even if the rwsem is free if the following sequence happens:

Non-first waiter First waiter Lock holder
---------------- ------------ -----------
Acquire wait_lock
rwsem_try_write_lock():
Set handoff bit if RT or
wait too long
Set waiter->handoff_set
Release wait_lock
Acquire wait_lock
Inherit waiter->handoff_set
Release wait_lock
Clear owner
Release lock
if (waiter.handoff_set) {
rwsem_spin_on_owner(();
if (OWNER_NULL)
goto trylock_again;
}
trylock_again:
Acquire wait_lock
rwsem_try_write_lock():
if (first->handoff_set && (waiter != first))
return false;
Release wait_lock

It is especially problematic if the non-first waiter is an RT task and
it is running on the same CPU as the first waiter as this can lead to
live lock.
I'm struggling to connect the Changelog to the actual patch. I see the
problem, but I don't see how the below helps or is even related to the
described problem.

Sorry if the description isn't clear, I will rephrase it to make it clearer. The basic idea is that a non-first waiter can mistakenly believe that it can spin on the lock. However, when rwsem_try_write_lock() is called, it can never acquire the lock and move on because it is not the first waiter:

                       if (first->handoff_set && (waiter != first))
                                return false;

If that waiter happen to be an RT task, it can block the real first waiter to acquire the lock if it happen to run the same CPU.

Cheers,
Longman