Re: [REGRESSION] sched/core merge 1c3b68f0d55b: futex_waitv lost wakeup hangs RE Engine games and PID 1 init
From: Peter Zijlstra
Date: Tue Apr 21 2026 - 10:46:38 EST
On Tue, Apr 21, 2026 at 03:53:32PM +0200, Peter Zijlstra wrote:
> On Tue, Apr 21, 2026 at 06:19:52PM +0500, Mikhail Gavrilov wrote:
> > Hi,
> >
> > I've bisected a user-visible regression to the sched/core merge for the
> > post-7.0 merge window. Resident Evil 2/3/4/9 running under Proton hang
> > deterministically during level load on any kernel built from a tree
> > that includes this merge, and recover on its first parent. The same
> > lost-wakeup signature also appears at PID 1 startup on intermediate
> > bisect steps, preventing boot entirely in some cases.
> >
> > The bug reproduces on two independent workstations (ASUS and ASRock
> > B650 boards, both Ryzen 9 7950X + RX 7900 XTX + 64 GB DDR5), so it is
> > not board-specific and not a one-machine environmental issue.
> >
> > ## Summary
> >
> > - First bad: 1c3b68f0d55b ("Merge tag 'sched-core-2026-04-13' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
> > - Parent 1 (33c66eb5e984, master before merge): good
> > - Parent 2 (78cde54ea5f0, tip of sched/core): good
> > - Merge: bad
> >
> > Both parents are good in isolation; only the merge result exhibits the
> > bug. Linear bisection inside the sched/core branch (v7.0..78cde54ea5f0)
> > yields no first-bad commit, which is consistent with a semantic
> > conflict introduced by the merge itself rather than by any single
> > commit in the pulled branch.
> >
> > The bisect was run twice with different bookkeeping for inconclusive
> > steps (first treating boot-hang merges as 'skip', second re-testing
> > them and marking one — 88b29f3f — as 'bad' after observing the same
> > lost-wakeup signature during early init). Both runs converged on the
> > same first-bad commit 1c3b68f0. The 'good' steps inside
> > 78cde54ea5f0..1c3b68f0^ therefore reflect the tip-of-master state with
> > all the non-sched pull requests already merged but without the
> > sched/core pull, and they reproducibly pass the game test.
> >
> > Reproducibility is ~100% in both directions: every tested build that
> > includes the merge hangs on the first level-load attempt; every tested
> > build with parent 1 as the tip completes a full level playthrough and
> > a save-resume in both RE2 and RE9 without issue.
>
> And you're absolutely sure:
>
> 25500ba7e77c ("locking/mutex: Remove the list_head from struct mutex")
>
> isn't to blame? That appears to have broken ww_mutex, which is used
> quite heavily by the graphics stack.
Specifically, see this thread:
https://lore.kernel.org/r/95651a71-1adf-45ba-83eb-5744bc6d4a52@xxxxxxx