Re: Regression on linux-next (next-20260324 )

From: mikhail.v.gavrilov@xxxxxxxxx

Date: Wed Apr 22 2026 - 11:56:35 EST


On Wed, 2026-04-22 at 11:23 +0200, Peter Zijlstra wrote:
>
> How's this? It 'passes' the ww_mutex selftest thing in so far as that
> I
> get the same:
>
> [    2.312369] Beginning ww (wound) mutex selftests
> [    4.853240] stress (stress_inorder_work) failed with -35
> [    9.379572] Beginning ww (die) mutex selftests
> [   16.435831] All ww mutex selftests passed
>
> before the offending commit and after this patch.
>
> ---
> Subject: Subject: locking/mutex: Fix ww_mutex wait_list operations
> From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Date: Wed Apr 22 10:38:41 CEST 2026
>
> Chaitanya and John reported commit 25500ba7e77c ("locking/mutex:
> Remove the
> list_head from struct mutex") wrecked ww_mutex.
>
> Specifically there were 2 issues:
>
>  - __ww_waiter_prev() had the termination condition wrong; it would
> terminate
>    when the previous entry was the first, which results in a
> truncated
>    iteration: W3, W2, (no W1).
>
>  - __mutex_add_waiter(@pos != NULL), as used by __ww_waiter_add() /
>    __ww_mutex_add_waiter(); this inserts @waiter before @pos (which
> is what
>    list_add_tail() does). But this should then also update lock-
> >first_waiter.
>
> Much thanks to Prateek for spotting the __mutex_add_waiter() issue!
>
> Fixes: 25500ba7e77c ("locking/mutex: Remove the list_head from struct
> mutex")
> Reported-by: "Borah, Chaitanya Kumar"
> <chaitanya.kumar.borah@xxxxxxxxx>
> Closes:
> https://lore.kernel.org/r/af005996-05e9-4336-8450-d14ca652ba5d%40intel.com
> Reported-by: John Stultz <jstultz@xxxxxxxxxx>
> Closes:
> https://lore.kernel.org/r/CANDhNCq%3Doizzud3hH3oqGzTrcjB8OwGeineJ3mwZuGdDWG8fRQ%40mail.gmail.com
> Debugged-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> ---
>  kernel/locking/mutex.c    |   40 +++++++++++++++++++++++++++--------
> -----
>  kernel/locking/ww_mutex.h |   34 ++++++++++++++++++++++++++++++++--
>  2 files changed, 59 insertions(+), 15 deletions(-)
>
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -198,27 +198,43 @@ static inline void __mutex_clear_flag(st
>  }
>  
>  /*
> - * Add @waiter to a given location in the lock wait_list and set the
> - * FLAG_WAITERS flag if it's the first waiter.
> + * Add @waiter to the @lock wait_list and set the FLAG_WAITERS flag
> if it's
> + * the first waiter.
> + *
> + * When @pos, @waiter is added before the waiter indicated by @pos.
> Otherwise
> + * @waiter will be added to the tail of the list.
>   */
>  static void
>  __mutex_add_waiter(struct mutex *lock, struct mutex_waiter *waiter,
> -    struct mutex_waiter *first)
> +    struct mutex_waiter *pos)
>   __must_hold(&lock->wait_lock)
>  {
> + struct mutex_waiter *first = lock->first_waiter;
> +
>   hung_task_set_blocker(lock, BLOCKER_TYPE_MUTEX);
>   debug_mutex_add_waiter(lock, waiter, current);
>  
> - if (!first)
> - first = lock->first_waiter;
> + if (pos) {
> + /*
> + * Insert @waiter before @pos.
> + */
> + list_add_tail(&waiter->list, &pos->list);
> + /*
> + * If @pos == @first, then @waiter will be the new
> first.
> + */
> + if (pos == first)
> + lock->first_waiter = waiter;
> + return;
> + }
>  
>   if (first) {
>   list_add_tail(&waiter->list, &first->list);
> - } else {
> - INIT_LIST_HEAD(&waiter->list);
> - lock->first_waiter = waiter;
> - __mutex_set_flag(lock, MUTEX_FLAG_WAITERS);
> + return;
>   }
> +
> + INIT_LIST_HEAD(&waiter->list);
> + lock->first_waiter = waiter;
> + __mutex_set_flag(lock, MUTEX_FLAG_WAITERS);
>  }
>  
>  static void
> @@ -229,10 +245,8 @@ __mutex_remove_waiter(struct mutex *lock
>   __mutex_clear_flag(lock, MUTEX_FLAGS);
>   lock->first_waiter = NULL;
>   } else {
> - if (lock->first_waiter == waiter) {
> - lock->first_waiter =
> list_first_entry(&waiter->list,
> -       struct
> mutex_waiter, list);
> - }
> + if (lock->first_waiter == waiter)
> + lock->first_waiter = list_next_entry(waiter,
> list);
>   list_del(&waiter->list);
>   }
>  
> --- a/kernel/locking/ww_mutex.h
> +++ b/kernel/locking/ww_mutex.h
> @@ -6,6 +6,19 @@
>  #define MUTEX_WAITER mutex_waiter
>  #define WAIT_LOCK wait_lock
>  
> +/*
> + *           +--------+
> + *           | first  |
> + *           +--------+
> + *                |
> + *                v
> + *  +----+     +----+     +----+
> + *  | W3 | <-> | W1 | <-> | W2 |
> + *  +----+     +----+     +----+
> + *    ^                     ^
> + *    +---------------------+
> + */
> +
>  static inline struct mutex_waiter *
>  __ww_waiter_first(struct mutex *lock)
>   __must_hold(&lock->wait_lock)
> @@ -13,26 +26,43 @@ __ww_waiter_first(struct mutex *lock)
>   return lock->first_waiter;
>  }
>  
> +/*
> + * for (cur = __ww_waiter_first(); cur; cur = __ww_waiter_next())
> + *
> + * Should iterate like: W1, W2, W3
> + */
>  static inline struct mutex_waiter *
>  __ww_waiter_next(struct mutex *lock, struct mutex_waiter *w)
>   __must_hold(&lock->wait_lock)
>  {
>   w = list_next_entry(w, list);
> + /*
> + * Terminate if the next entry is the first again, that has
> already
> + * been observed.
> + */
>   if (lock->first_waiter == w)
>   return NULL;
>  
>   return w;
>  }
>  
> +/*
> + * for (cur = __ww_waiter_last(); cur; cur = __ww_waiter_prev())
> + *
> + * Should iterate like: W3, W2, W1
> + */
>  static inline struct mutex_waiter *
>  __ww_waiter_prev(struct mutex *lock, struct mutex_waiter *w)
>   __must_hold(&lock->wait_lock)
>  {
> - w = list_prev_entry(w, list);
> + /*
> + * Terminate at the first entry, the previous entry of first
> is the
> + * last and that has already been observed.
> + */
>   if (lock->first_waiter == w)
>   return NULL;
>  
> - return w;
> + return list_prev_entry(w, list);
>  }
>  
>  static inline struct mutex_waiter *


Confirmed on an independent userspace-visible reproducer: Resident
Evil 2/3/4/9 under Proton on AMD Zen4 + RX 7900 XTX, which hangs
deterministically during level load on current master (main thread
parked in futex_waitv). With this patch applied on top of master,
both RE2 and RE9 complete a full playthrough with save-resume on two
independent workstations (ASUS and ASRock B650). No hang, no splats.

Symptom details and third bisect log are in the separate thread at
https://lore.kernel.org/r/CABXGCsO5fKq2nD9nO8yO1z50ZzgCPWqueNXHANjntaswoOh2Dg@xxxxxxxxxxxxxx

Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@xxxxxxxxx>

--
Thanks,
Mikhail