Re: BUG_ON(!newowner) in fixup_pi_state_owner()
From: Mike Galbraith
Date: Tue Nov 03 2020 - 19:57:11 EST
On Tue, 2020-11-03 at 17:31 -0600, Gratian Crisan wrote:
> Hi all,
>
> I apologize for waking up the futex demons (and replying to my own
> email), but ...
>
> Gratian Crisan writes:
> >
> > Brandon and I have been debugging a nasty race that leads to
> > BUG_ON(!newowner) in fixup_pi_state_owner() in futex.c. So far
> > we've only been able to reproduce the issue on 4.9.y-rt kernels.
> > We are still testing if this is a problem for later RT branches.
>
> I was able to reproduce the BUG_ON(!newowner) in fixup_pi_state_owner()
> with a 5.10.0-rc1-rt1 kernel (currently testing 5.10.0-rc2-rt4).
My box says it's generic.
KERNEL: vmlinux-5.10.0.gb7cbaf5-master.gz
DUMPFILE: vmcore
CPUS: 8
DATE: Wed Nov 4 01:46:56 2020
UPTIME: 00:02:06
LOAD AVERAGE: 0.25, 0.15, 0.06
TASKS: 726
NODENAME: homer
RELEASE: 5.10.0.gb7cbaf5-master
VERSION: #26 SMP Tue Nov 3 14:10:35 CET 2020
MACHINE: x86_64 (3591 Mhz)
MEMORY: 16 GB
PANIC: ""
PID: 4631
COMMAND: "f_waiter"
TASK: ffff88818a1fb900 [THREAD_INFO: ffff88818a1fb900]
CPU: 1
STATE: TASK_RUNNING (PANIC)
crash.rt> bt
PID: 4631 TASK: ffff88818a1fb900 CPU: 1 COMMAND: "f_waiter"
#0 [ffff88816a0b3a58] machine_kexec at ffffffff8104b2dc
#1 [ffff88816a0b3aa0] __crash_kexec at ffffffff810fc97a
#2 [ffff88816a0b3b60] crash_kexec at ffffffff810fda55
#3 [ffff88816a0b3b70] oops_end at ffffffff81021813
#4 [ffff88816a0b3b90] do_trap at ffffffff8101eaec
#5 [ffff88816a0b3be0] do_error_trap at ffffffff8101ebd5
#6 [ffff88816a0b3c20] exc_invalid_op at ffffffff816d8bdb
#7 [ffff88816a0b3c40] asm_exc_invalid_op at ffffffff81800a62
#8 [ffff88816a0b3cc8] fixup_pi_state_owner at ffffffff810f065c
#9 [ffff88816a0b3d58] futex_wait_requeue_pi.constprop.0 at ffffffff810f1fcb
#10 [ffff88816a0b3ec8] do_futex at ffffffff810f2482
#11 [ffff88816a0b3ed8] __x64_sys_futex at ffffffff810f2ab5
#12 [ffff88816a0b3f40] do_syscall_64 at ffffffff816d88c3
#13 [ffff88816a0b3f50] entry_SYSCALL_64_after_hwframe at ffffffff8180007c
RIP: 00007f665b94f839 RSP: 00007f665b056e88 RFLAGS: 00000212
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f665b94f839
RDX: 0000000000000509 RSI: 000000000000008b RDI: 00000000006020c0
RBP: 00007f665b056ed0 R8: 00000000006020c4 R9: 0000000000000000
R10: 00007f665b056ef0 R11: 0000000000000212 R12: 00007ffd42284c3e
R13: 00007ffd42284c3f R14: 0000000000000000 R15: 00007ffd42284c40
ORIG_RAX: 00000000000000ca CS: 0033 SS: 002b
crash.rt> gdb list *0xffffffff810f065c
0xffffffff810f065c is in fixup_pi_state_owner (kernel/futex.c:2386).
2381
2382 /*
2383 * Since we just failed the trylock; there must be an owner.
2384 */
2385 newowner = rt_mutex_owner(&pi_state->pi_mutex);
2386 BUG_ON(!newowner);
2387 } else {
2388 WARN_ON_ONCE(argowner != current);
2389 if (oldowner == current) {
2390 /*
crash.rt>