Re: [PATCH 4/4] test-ww_mutex: Retry lock acquisition after timeout if deadlocked

From: Haakon Bugge

Date: Mon Jun 22 2026 - 09:55:31 EST

> On 18 Jun 2026, at 20:55, John Stultz <jstultz@xxxxxxxxxx> wrote:
On Wed, Jun 17, 2026 at 5:13 AM Håkon Bugge <haakon.bugge@xxxxxxxxxx> wrote:

>> stress_inorder_work() terminates when its timeout expires. If the
>> final lock acquisition attempt returns -EDEADLK at that point, the
>> test may report a false failure even though the deadlock would have
>> been resolved shortly thereafter. Retry a limited number of times
>> after timeout before reporting failure.
>>
>> Without this commit, we may see in the log:
>>
>> Beginning ww (wound) mutex selftests
>> stress (stress_inorder_work) failed with -35
>>
>> Fixes: cfa92b6d5207 ("locking/ww_mutex/test: Make sure we bail out instead of livelock")
>> Signed-off-by: Håkon Bugge <haakon.bugge@xxxxxxxxxx>
>> ---
>> kernel/locking/test-ww_mutex.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
>> index 6b29a7a8f5fba..95de03179eac0 100644
>> --- a/kernel/locking/test-ww_mutex.c
>> +++ b/kernel/locking/test-ww_mutex.c
>> @@ -444,6 +444,7 @@ static void stress_inorder_work(struct work_struct *work)
>> struct ww_acquire_ctx ctx;
>> int *order;
>> int err;
>> + int attempts_after_tmout = 10000;
>>
>> stress->result = -ENOMEM;
>>
>> @@ -476,7 +477,7 @@ static void stress_inorder_work(struct work_struct *work)
>> ww_mutex_unlock(&locks[order[n]]);
>>
>> if (err == -EDEADLK) {
>> - if (!time_after(jiffies, stress->timeout)) {
>> + if ((!time_after(jiffies, stress->timeout)) || attempts_after_tmout--) {
>> ww_mutex_lock_slow(&locks[order[contended]], &ctx);
>> goto retry;
>> }
>
> Thanks for sending this out! I definitely have seen timeouts hit in my
> previous testing. Though, the approach here of "trying harder after
> the timeout" at first glance strikes me as like a procrastinator's
> panic. :)

Hehe, I think the undersigned is a good example of a procrastinator ;-)

> It might be nice to have a bit more context in the commit message as
> to the soundness of this approach.

I wrote something about it the cover letter:

<quote>
It is worth mentioning that the last patch retries up to 10,000 lock
acquisitions after timeout. This value was determined empirically: 10
and 100 retries were insufficient on the test systems, while 10,000
provided stable results and avoided false negatives.
</quote>

Shall I add that to the commit message?

> If the deadlock resolution takes
> longer then we expect in time,

That is not the case. When we terminate due to timeout, we have no
resolution of the deadlock. I view it as there is a probability we run
into deadlock, let's say 10%, for each iteration. Now, if we terminate
due to timeout - and - we are unlucky and the last iteration
deadlocked, we cannot just return -EDEADLK, as that will mark the test
as failed. Instead, we retry some more iterations in order to resolve
the deadlock.

> is there a reason the number of
> iterations tried is also important? Is the key to this change just
> maybe ensuring we got enough cputime within the timeout? ww_mutexes
> are intended to ensure forward progress, but the apparent livelocks
> seen prior to cfa92b6d5207 maybe should hint something else might be
> going on, so while this may eliminate any timeouts hit due to
> scheduling delays, just trying more may not resolve all failures here.
>
> So, no objections with the change here, but maybe its worth improving
> the commit message a bit more?

Yes, let me know how you feel about adding what I wrote in the
cover-letter quoted above to the commit message.

Thxs, Håkon