Re: Regression on linux-next (next-20260324 )

From: Peter Zijlstra

Date: Fri Mar 27 2026 - 12:41:18 EST


On Fri, Mar 27, 2026 at 07:09:26PM +0530, Borah, Chaitanya Kumar wrote:
> Hello Matthew,
>
> Hope you are doing well. I am Chaitanya from the linux graphics team in
> Intel.
>
> This mail is regarding a regression we are seeing in our CI runs[1] on
> linux-next repository.
>
> Since the version next-20260324 [2], we are seeing the following regression
>
> `````````````````````````````````````````````````````````````````````````````````
> <5>[ 157.361977] [IGT] Inactivity timeout exceeded. Killing the current
> test with SIGQUIT.
> <6>[ 157.362097] sysrq: HELP : loglevel(0-9) reboot(b) crash(c)
> show-all-locks(d) terminate-all-tasks(e) memory-full-oom-kill(f)
> kill-all-tasks(i) thaw-filesystems(j) sak(k)
> show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n)
> poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s)
> show-task-states(t) unmount(u) force-fb(v) show-blocked-tasks(w)
> dump-ftrace-buffer(z) replay-kernel-logs(R)
> <6>[ 157.399543] sysrq: Show State
> <6>[ 157.403061] task:systemd state:S stack:0 pid:1 tgid:1
> ppid:0 task_flags:0x400100 flags:0x00080000
> <6>[ 157.403067] Call Trace:
> <6>[ 157.403069] <TASK>
> <6>[ 157.403072] __schedule+0x5d7/0x1ef0
> <6>[ 157.403078] ? lock_acquire+0xc4/0x300
> <6>[ 157.403084] ? schedule+0x10e/0x180
> <6>[ 157.403087] ? lock_release+0xcd/0x2b0
> <6>[ 157.403092] schedule+0x3a/0x180
> <6>[ 157.403094] schedule_hrtimeout_range_clock+0x112/0x120
> <6>[ 157.403097] ? do_epoll_wait+0x3e4/0x5b0
> <6>[ 157.403102] ? lock_release+0xcd/0x2b0
> <6>[ 157.403104] ? _raw_spin_unlock_irq+0x27/0x70
> <6>[ 157.403106] ? do_epoll_wait+0x3e4/0x5b0
> <6>[ 157.403110] schedule_hrtimeout_range+0x13/0x30
> `````````````````````````````````````````````````````````````````````````````````
> Details log can be found in [3].
>
> After bisecting the tree, the following patch [4] seems to be the first
> "bad" commit
>
> `````````````````````````````````````````````````````````````````````````````````````````````````````````
> commit 25500ba7e77ce9d3d9b5a1929d41a2ee2e23f6fe
> Author: Matthew Wilcox (Oracle) willy@xxxxxxxxxxxxx
> Date:   Thu Mar 5 19:55:43 2026 +0000
>
>     locking/mutex: Remove the list_head from struct mutex
> `````````````````````````````````````````````````````````````````````````````````````````````````````````
>
> We could not revert the patch because of merge conflict but resetting to the
> parent of the commit seems to fix the issue.
>
> Could you please check why the patch causes this regression and provide a
> fix if necessary?

Does this help?

---
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -40,10 +40,10 @@ __ww_waiter_last(struct mutex *lock)
__must_hold(&lock->wait_lock)
{
struct mutex_waiter *w = lock->first_waiter;
+ if (!w)
+ return NULL;

- if (w)
- w = list_prev_entry(w, list);
- return w;
+ return __ww_waiter_prev(lock, w);
}

static inline void