Re: Regression on linux-next (next-20260324 )

From: Borah, Chaitanya Kumar

Date: Mon Mar 30 2026 - 04:33:53 EST

On 3/27/2026 10:13 PM, Peter Zijlstra wrote:

On Fri, Mar 27, 2026 at 05:31:00PM +0100, Peter Zijlstra wrote:

On Fri, Mar 27, 2026 at 07:09:26PM +0530, Borah, Chaitanya Kumar wrote:

Hello Matthew,

Hope you are doing well. I am Chaitanya from the linux graphics team in
Intel.

This mail is regarding a regression we are seeing in our CI runs[1] on
linux-next repository.

Since the version next-20260324 [2], we are seeing the following regression

`````````````````````````````````````````````````````````````````````````````````
<5>[ 157.361977] [IGT] Inactivity timeout exceeded. Killing the current
test with SIGQUIT.
<6>[ 157.362097] sysrq: HELP : loglevel(0-9) reboot(b) crash(c)
show-all-locks(d) terminate-all-tasks(e) memory-full-oom-kill(f)
kill-all-tasks(i) thaw-filesystems(j) sak(k)
show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n)
poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s)
show-task-states(t) unmount(u) force-fb(v) show-blocked-tasks(w)
dump-ftrace-buffer(z) replay-kernel-logs(R)
<6>[ 157.399543] sysrq: Show State
<6>[ 157.403061] task:systemd state:S stack:0 pid:1 tgid:1
ppid:0 task_flags:0x400100 flags:0x00080000
<6>[ 157.403067] Call Trace:
<6>[ 157.403069] <TASK>
<6>[ 157.403072] __schedule+0x5d7/0x1ef0
<6>[ 157.403078] ? lock_acquire+0xc4/0x300
<6>[ 157.403084] ? schedule+0x10e/0x180
<6>[ 157.403087] ? lock_release+0xcd/0x2b0
<6>[ 157.403092] schedule+0x3a/0x180
<6>[ 157.403094] schedule_hrtimeout_range_clock+0x112/0x120
<6>[ 157.403097] ? do_epoll_wait+0x3e4/0x5b0
<6>[ 157.403102] ? lock_release+0xcd/0x2b0
<6>[ 157.403104] ? _raw_spin_unlock_irq+0x27/0x70
<6>[ 157.403106] ? do_epoll_wait+0x3e4/0x5b0
<6>[ 157.403110] schedule_hrtimeout_range+0x13/0x30
`````````````````````````````````````````````````````````````````````````````````
Details log can be found in [3].

After bisecting the tree, the following patch [4] seems to be the first
"bad" commit

`````````````````````````````````````````````````````````````````````````````````````````````````````````
commit 25500ba7e77ce9d3d9b5a1929d41a2ee2e23f6fe
Author: Matthew Wilcox (Oracle) willy@xxxxxxxxxxxxx
Date: Thu Mar 5 19:55:43 2026 +0000

locking/mutex: Remove the list_head from struct mutex
`````````````````````````````````````````````````````````````````````````````````````````````````````````

We could not revert the patch because of merge conflict but resetting to the
parent of the commit seems to fix the issue.

Could you please check why the patch causes this regression and provide a
fix if necessary?

Does this help?

More tidy version of the same...

---
diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index b1834ab7e782..bb8b410779d4 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -42,7 +42,7 @@ __ww_waiter_last(struct mutex *lock)
struct mutex_waiter *w = lock->first_waiter;
if (w)
- w = list_prev_entry(w, list);
+ w = __ww_waiter_prev(lock, w);
return w;
}

Thank you for the response, Peter. Unfortunately, the issue is still seen with this change.

Regards
Chaitanya