Re: [WARNING] RCU stall in sock_def_readable()
From: Paul E. McKenney
Date: Thu Apr 16 2026 - 20:16:20 EST
On Wed, Apr 15, 2026 at 01:27:22PM -0400, Steven Rostedt wrote:
> Hi,
>
> Did anything change recently with respect to RCU stall detection or
> sock_def_readable()? My tests have been failing pretty much in the same
> place every other time I run it. It's getting rather annoying. The test
> runs:
>
> trace-cmd record -p function -e syscalls ./hackbench 50
>
> (enables syscall trace events along with function tracing and records to a file)
>
> On a VM with the attached config (it has LOCKDEP enabled), and every
> other run the following triggers:
>
> [ 219.564147] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 219.566376] rcu: Tasks blocked on level-0 rcu_node (CPUs 0-3): P3152/1:b..l
> [ 219.568975] rcu: (detected by 0, t=6502 jiffies, g=30165, q=175 ncpus=4)
> [ 219.571215] task:hackbench_64 state:R running task stack:0 pid:3152 tgid:3152 ppid:3145 task_flags:0x400000 flags:0x00080000
> [ 219.575470] Call Trace:
> [ 219.576432] <TASK>
> [ 219.577396] __schedule+0x4ac/0x12f0
> [ 219.578731] preempt_schedule_common+0x26/0xe0
> [ 219.580319] ? preempt_schedule_thunk+0x16/0x30
> [ 219.581956] preempt_schedule_thunk+0x16/0x30
> [ 219.583550] ? _raw_spin_unlock_irqrestore+0x39/0x70
> [ 219.585328] _raw_spin_unlock_irqrestore+0x5d/0x70
> [ 219.587039] sock_def_readable+0x9c/0x2b0
> [ 219.588509] unix_stream_sendmsg+0x2d7/0x710
> [ 219.590104] sock_write_iter+0x185/0x190
> [ 219.591591] vfs_write+0x457/0x5b0
> [ 219.592900] ksys_write+0xc8/0xf0
> [ 219.594173] do_syscall_64+0x117/0x1660
> [ 219.595586] ? irqentry_exit+0xd9/0x690
> [ 219.596992] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 219.598809] RIP: 0033:0x7f30e2eb0190
> [ 219.600239] RSP: 002b:00007ffd4e5f9368 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
> [ 219.602828] RAX: ffffffffffffffda RBX: 00007ffd4e5f94f8 RCX: 00007f30e2eb0190
> [ 219.605254] RDX: 0000000000000001 RSI: 00007ffd4e5f938f RDI: 0000000000000006
> [ 219.607598] RBP: 00007ffd4e5f93e0 R08: 00057bcf00000000 R09: 0000000000000000
> [ 219.609969] R10: 00007f30e2dd14d0 R11: 0000000000000202 R12: 0000000000000000
> [ 219.612453] R13: 00007ffd4e5f9510 R14: 000055c31b856dd8 R15: 00007f30e2fdb020
> [ 219.614933] </TASK>
>
> Always with the exact same stacktrace!
>
> Note, the machine is fine. It's not locked up at all. It appears to be
> something that might be blocking RCU for a little longer than RCU would
> like.
>
> I'm not sure if RCU changed its stall detection that makes it more
> sensitive or if the sock_def_readable() did something different that
> causes more contention or something, but this has started with the
> 7.0-rc1.
>
> Any ideas?
One "hail Mary" thought is to revert this guy and see if it helps:
d41e37f26b31 ("rcu: Fix rcu_read_unlock() deadloop due to softirq")
This commit fixes a bug, so we cannot revert it in mainline, but there
is some reason to believe that there are more bugs beyond the one that
it fixed, and it might have (through no fault of its own) made those
other bugs more probable.
Worth a try, anyway!
Thanx, Paul