Re: [PATCH 3/4] test-ww_mutex: Handle transient -EDEADLK in test_cycle_work

From: Haakon Bugge

Date: Mon Jun 22 2026 - 09:23:36 EST

Hi John,

> On 18 Jun 2026, at 21:04, John Stultz <jstultz@xxxxxxxxxx> wrote:
>
>> On Wed, Jun 17, 2026 at 5:13 AM Håkon Bugge <haakon.bugge@xxxxxxxxxx> wrote:
>>
>>
>> There is a timing issue in test_cycle_work(), in the sense that
>> acquiring *a_mutex* after deadlock has been detected on the *b_mutex*,
>> may not succeed immediately. This may lead to false negatives, which
>> shows up in the log as:
>>
>> cyclic deadlock not resolved, ret[77/93] = -35
>>
>> We fix that by re-trying until the lock is acquired.
>>
>
> I definitely have seen this error in testing previously.
>
> But is this fix right? When getting an EDEADLK I thought the ww_mutex
> protocol requires the task drop all its locks and re-try acquiring
> them all again.

This is true for deadlocks. But my main point is that this is *not* a
deadlock which requires one of the WW resolution methods. Put it
another way, if it *is* a deadlock, my do-while loop would loop
forever. It doesn't, and therefore, it isn't a deadlock.

Another question is of course why ww_mutex_lock() returns -EDEADLK
when it isn't the case.

>> Fixes: d1b42b800e5d ("locking/ww_mutex: Add kselftests for resolving ww_mutex cyclic deadlocks")
>> Fixes: e4a02ed2aaf4 ("locking/ww_mutex: Fix runtime warning in the WW mutex selftest")
>> Signed-off-by: Håkon Bugge <haakon.bugge@xxxxxxxxxx>
>> ---
>> kernel/locking/test-ww_mutex.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
>> index 5a4c92801bdfb..6b29a7a8f5fba 100644
>> --- a/kernel/locking/test-ww_mutex.c
>> +++ b/kernel/locking/test-ww_mutex.c
>> @@ -306,7 +306,9 @@ static void test_cycle_work(struct work_struct *work)
>> err = 0;
>> ww_mutex_unlock(&cycle->a_mutex);
>> ww_mutex_lock_slow(cycle->b_mutex, &ctx);
>> - erra = ww_mutex_lock(&cycle->a_mutex, &ctx);
>> + do {
>> + erra = ww_mutex_lock(&cycle->a_mutex, &ctx);
>> + } while (erra == -EDEADLK);
>> }
>>
> I don't have a clear example in mind, but just trying to grab the same
> lock again (especially in a loop with no timeout) seems like it could
> open up other problems here.

The do { ... } while (erra == -EDEADLK) loop in test_cycle_work() is
terminating under the ww-mutex progress guarantee. The test builds a
finite circular dependency where worker N waits on worker N+1’s
a_mutex. On -EDEADLK, the worker releases its own a_mutex, acquires
b_mutex through the slow path, and retries. Thus each deadlock
encounter removes one hold from the cycle before retrying. Since the
cycle is finite and every participant follows the same backoff rule,
the deadlock chain must eventually collapse; once the conflicting
owner releases its lock, the retry of a_mutex returns success and the
loop exits.

Thxs, Håkon