Re: [PATCH 4/4] test-ww_mutex: Retry lock acquisition after timeout if deadlocked
From: John Stultz
Date: Thu Jun 18 2026 - 14:55:56 EST
On Wed, Jun 17, 2026 at 5:13 AM Håkon Bugge <haakon.bugge@xxxxxxxxxx> wrote:
>
> stress_inorder_work() terminates when its timeout expires. If the
> final lock acquisition attempt returns -EDEADLK at that point, the
> test may report a false failure even though the deadlock would have
> been resolved shortly thereafter. Retry a limited number of times
> after timeout before reporting failure.
>
> Without this commit, we may see in the log:
>
> Beginning ww (wound) mutex selftests
> stress (stress_inorder_work) failed with -35
>
> Fixes: cfa92b6d5207 ("locking/ww_mutex/test: Make sure we bail out instead of livelock")
> Signed-off-by: Håkon Bugge <haakon.bugge@xxxxxxxxxx>
> ---
> kernel/locking/test-ww_mutex.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
> index 6b29a7a8f5fba..95de03179eac0 100644
> --- a/kernel/locking/test-ww_mutex.c
> +++ b/kernel/locking/test-ww_mutex.c
> @@ -444,6 +444,7 @@ static void stress_inorder_work(struct work_struct *work)
> struct ww_acquire_ctx ctx;
> int *order;
> int err;
> + int attempts_after_tmout = 10000;
>
> stress->result = -ENOMEM;
>
> @@ -476,7 +477,7 @@ static void stress_inorder_work(struct work_struct *work)
> ww_mutex_unlock(&locks[order[n]]);
>
> if (err == -EDEADLK) {
> - if (!time_after(jiffies, stress->timeout)) {
> + if ((!time_after(jiffies, stress->timeout)) || attempts_after_tmout--) {
> ww_mutex_lock_slow(&locks[order[contended]], &ctx);
> goto retry;
> }
Thanks for sending this out! I definitely have seen timeouts hit in my
previous testing. Though, the approach here of "trying harder after
the timeout" at first glance strikes me as like a procrastinator's
panic. :)
It might be nice to have a bit more context in the commit message as
to the soundness of this approach. If the deadlock resolution takes
longer then we expect in time, is there a reason the number of
iterations tried is also important? Is the key to this change just
maybe ensuring we got enough cputime within the timeout? ww_mutexes
are intended to ensure forward progress, but the apparent livelocks
seen prior to cfa92b6d5207 maybe should hint something else might be
going on, so while this may eliminate any timeouts hit due to
scheduling delays, just trying more may not resolve all failures here.
So, no objections with the change here, but maybe its worth improving
the commit message a bit more?
thanks
-john