[BUG] RCU stall / hung rcu_gp: process_srcu blocked in synchronize_rcu_normal triggered by perf trace teardown on 7.0.0-rc1
From: Zw Tang
Date: Mon Mar 02 2026 - 06:27:15 EST
Hi,
I am reporting an RCU stall / hung task issue triggered by a syzkaller
reproducer on Linux 7.0.0-rc1.
The system gets stuck with RCU stalls and multiple hung tasks. The
rcu_gp workqueue shows in-flight “process_srcu”, and kworkers running
“process_srcu” are blocked for >143s. At the same time, a repro task
blocks during perf event teardown (close()), waiting on SRCU.
Key log excerpts:
“Showing busy workqueues and worker pools: workqueue rcu_gp ...
in-flight: process_srcu”
“rcu: INFO: rcu_preempt detected stalls on CPUs/tasks ... Tasks
blocked on level-0 rcu_node ...”
hung tasks:
kworker/... Workqueue: rcu_gp process_srcu
process_srcu -> try_check_zero -> synchronize_rcu_normal -> wait_for_completion
repro2 task blocked in close():
__x64_sys_close -> __fput -> perf_release -> perf_event_release_kernel
-> __free_event
-> perf_trace_destroy -> perf_trace_event_unreg -> __synchronize_srcu
-> wait_for_completion
My understanding / suspected wait chain:
repro task closes a perf fd and enters perf trace teardown:
perf_trace_event_unreg() waits for SRCU via __synchronize_srcu().
rcu_gp workers running process_srcu attempt to advance SRCU and end up calling
synchronize_rcu_normal(), waiting for a normal RCU grace period to complete.
RCU grace period does not complete (RCU stall reported on CPUs/tasks), causing:
__synchronize_srcu() to wait indefinitely, process_srcu to remain
blocked, and the system to stall.
This looks like a circular wait involving perf trace teardown + SRCU +
RCU GP progress (possibly aggravated by timing/RT scheduling).
Reproducer:
C reproducer: https://pastebin.com/raw/L9hDPCrP
kernel config: https://pastebin.com/raw/jCq8qdq7
console output : https://pastebin.com/raw/D8XETkeH
Kernel:
git tree: torvalds/linux
commit: 4d349ee5c7782f8b27f6cb550f112c5e26fff38d
kernel version: 7.0.0-rc1
hardware: QEMU Ubuntu 24.10
[ 170.750583] Showing busy workqueues and worker pools:
[ 170.750595] workqueue rcu_gp: flags=0x108
[ 170.750609] pwq 2: cpus=0 node=0 flags=0x0 nice=0 active=2 refcnt=3
[ 170.750635] in-flight: 9:process_srcu ,11:process_srcu
[ 170.750723] pool 2: cpus=0 node=0 flags=0x0 nice=0 hung=0s
workers=4 idle: 220
[ 173.609623] perf: interrupt took too long (69138 > 69021), lowering
kernel.pe0
[ 173.918434] perf: interrupt took too long (86584 > 86422), lowering
kernel.pe0
[ 174.394436] perf: interrupt took too long (109613 > 108230),
lowering kernel.0
[ 174.633565] perf: interrupt took too long (137226 > 137016),
lowering kernel.0
[ 174.652442] perf: interrupt took too long (171805 > 171532),
lowering kernel.0
[ 174.679472] perf: interrupt took too long (214868 > 214756),
lowering kernel.0
[ 174.728383] perf: interrupt took too long (268722 > 268585),
lowering kernel.0
[ 174.882802] perf: interrupt took too long (336002 > 335902),
lowering kernel.0
[ 271.443907] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 271.443951] rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P4093
[ 271.443972] rcu: (detected by 1, t=105002 jiffies, g=33573,
q=4342 ncpus=2)
[ 271.443986] task:repro2 state:R running task
stack:28792 pid:4093 tgid:4092 ppid:300 task_flags:0x400140
flags:0x00080012
[ 271.444042] Call Trace:
[ 271.444051] <IRQ>
[ 271.444058] sched_show_task+0x3b5/0x650
[ 271.444088] ? trace_event_raw_event_sched_pi_setprio+0x440/0x440
[ 271.444121] ? rcu_dump_cpu_stacks+0x337/0x4b0
[ 271.444146] ? write_comp_data+0x1f/0x70
[ 271.444164] ? __sanitizer_cov_trace_pc+0x1a/0x40
[ 271.444182] ? wq_watchdog_touch+0xec/0x170
[ 271.444198] rcu_sched_clock_irq+0x297b/0x31c0
[ 271.444221] ? tmigr_requires_handle_remote_up+0x143/0x1c0
[ 271.444245] ? rcu_momentary_eqs+0x40/0x40
[ 271.444261] ? __tmigr_cpu_deactivate+0x1a0/0x1a0
[ 271.444283] ? __sanitizer_cov_trace_pc+0x1a/0x40
[ 271.444301] ? tmigr_requires_handle_remote+0x1cd/0x2a0
[ 271.444324] ? tmigr_handle_remote+0x320/0x320
[ 271.444348] ? write_comp_data+0x1f/0x70
[ 271.444366] ? write_comp_data+0x1f/0x70
[ 271.444383] ? __cgroup_account_cputime_field+0xb9/0x160
[ 271.444406] ? write_comp_data+0x1f/0x70
[ 271.444424] ? hrtimer_run_queues+0x64/0x450
[ 271.444447] update_process_times+0xfa/0x200
[ 271.444469] tick_nohz_handler+0x504/0x720
[ 271.444487] ? find_held_lock+0x2b/0x80
[ 271.444505] ? tick_do_update_jiffies64+0x380/0x380
[ 271.444523] ? lock_release+0xc9/0x2a0
[ 271.444548] __hrtimer_run_queues+0x771/0xb20
[ 271.444568] ? tick_do_update_jiffies64+0x380/0x380
[ 271.444589] ? write_comp_data+0x1f/0x70
[ 271.444607] ? enqueue_hrtimer+0x360/0x360
[ 271.444626] ? __sanitizer_cov_trace_pc+0x1a/0x40
[ 271.444649] hrtimer_interrupt+0x36e/0x820
[ 271.444676] __sysvec_apic_timer_interrupt+0xb5/0x3b0
[ 271.444695] sysvec_apic_timer_interrupt+0x6b/0x80
[ 271.444720] </IRQ>
[ 271.444724] <TASK>
[ 271.444730] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 271.444748] RIP: 0010:finish_task_switch+0x126/0x5c0
[ 271.444771] Code: f6 4d 8d 7d 48 e8 8a 45 0a 00 31 f6 4c 89 ef e8
b0 fb ff ff 4c 89 ff e8 88 db b9 02 e8 93 c6 34 00 fb 65 4c 8b 3d 5a
34 cb 04 <49> 8d bf c8 14 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa
48 c1
[ 271.444786] RSP: 0018:ffff888009db79b0 EFLAGS: 00000202
[ 271.444798] RAX: 00000000001a071f RBX: ffff88800d6b4c80 RCX: 0000000000000006
[ 271.444808] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81467bbd
[ 271.444817] RBP: ffff888009db79f8 R08: 0000000000000001 R09: 0000000000000001
[ 271.444826] R10: fffffbfff0ae32ea R11: ffffffff85719757 R12: ffff8880073f9a80
[ 271.444836] R13: ffff88806d338fc0 R14: 0000000000000001 R15: ffff888009d19a80
[ 271.444852] ? finish_task_switch+0x11d/0x5c0
[ 271.444876] ? finish_task_switch+0xe0/0x5c0
[ 271.444897] ? __sanitizer_cov_trace_pc+0x1a/0x40
[ 271.444914] ? __switch_to+0x854/0x1270
[ 271.444936] __schedule+0x1198/0x4190
[ 271.444957] ? io_schedule_timeout+0x80/0x80
[ 271.444974] ? mark_held_locks+0x49/0x70
[ 271.444998] preempt_schedule_irq+0x4e/0x90
[ 271.445015] irqentry_exit+0x17b/0x6c0
[ 271.445037] ? irqentry_enter+0x2a/0xd0
[ 271.445058] ? trace_hardirqs_off_finish+0x12f/0x160
[ 271.445080] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 271.445096] RIP: 0010:__sanitizer_cov_trace_pc+0xd/0x40
[ 271.445127] Code: 00 00 ff 00 09 c2 75 07 4c 8b 81 e0 14 00 00 4c
89 c0 c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa bf 02 00 00 00 4c
8b 0c 24 <65> 48 8b 35 9b a7 9e 04 e8 46 ff ff ff 84 c0 74 20 48 8b 96
d0 14
[ 271.445141] RSP: 0018:ffff888009db7c28 EFLAGS: 00000246
[ 271.445152] RAX: dffffc0000000000 RBX: ffff88800df47b58 RCX: 0000000000000000
[ 271.445161] RDX: 1ffff11001be8f71 RSI: ffff888009d19a80 RDI: 0000000000000002
[ 271.445171] RBP: ffff888009d18000 R08: 0000000000000000 R09: ffffffff8167adf7
[ 271.445180] R10: ffffed1001d87110 R11: ffff88800ec38883 R12: 0000000000000001
[ 271.445190] R13: ffff888009db7cd0 R14: ffff888009d18028 R15: 00000000000000ec
[ 271.445203] ? __futex_wake_mark+0xb7/0xe0
[ 271.445222] ? __sanitizer_cov_trace_pc+0x1a/0x40
[ 271.445240] __futex_wake_mark+0xb7/0xe0
[ 271.445255] futex_wake_mark+0xa4/0x190
[ 271.445272] futex_wake+0x441/0x540
[ 271.445289] ? futex_wake_mark+0x190/0x190
[ 271.445306] ? percpu_counter_add_batch+0x11b/0x260
[ 271.445329] ? write_comp_data+0x1f/0x70
[ 271.445348] do_futex+0x26b/0x360
[ 271.445363] ? __ia32_sys_get_robust_list+0x140/0x140
[ 271.445380] ? lock_is_held_type+0x9b/0x110
[ 271.445404] __x64_sys_futex+0x1c9/0x480
[ 271.445420] ? do_futex+0x360/0x360
[ 271.445433] ? write_comp_data+0x1f/0x70
[ 271.445450] ? fd_install+0x1ec/0x4e0
[ 271.445471] ? __sanitizer_cov_trace_pc+0x1a/0x40
[ 271.445488] ? __sys_socket+0x4b/0x130
[ 271.445508] do_syscall_64+0x115/0x690
[ 271.445529] entry_SYSCALL_64_after_hwframe+0x4b/0x53
[ 271.445544] RIP: 0033:0x7fdfffbe4fc9
[ 271.445568] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 8e 0d 00 f7 d8 64 89
01 48
[ 271.445582] RSP: 002b:00007fdfffaebde8 EFLAGS: 00000246 ORIG_RAX:
00000000000000ca
[ 271.445595] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fdfffbe4fc9
[ 271.445605] RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 000055b24d1850ec
[ 271.445614] RBP: 00007fdfffaebe00 R08: 0000000000000000 R09: 0000000000000000
[ 271.445623] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffb7d2901e
[ 271.445632] R13: 00007fffb7d2901f R14: 00007fdfffaebf00 R15: 0000000000022000
[ 271.445650] </TASK>
[ 318.431885] INFO: task kworker/0:0:9 blocked for more than 143 seconds.
[ 318.431913] Not tainted 7.0.0-rc1-00301-g4d349ee5c778 #1
[ 318.431921] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 318.431928] task:kworker/0:0 state:D stack:27912 pid:9
tgid:9 ppid:2 task_flags:0x4208060 flags:0x00080000
[ 318.431983] Workqueue: rcu_gp process_srcu
[ 318.432032] Call Trace:
[ 318.432038] <TASK>
[ 318.432049] __schedule+0x1190/0x4190
[ 318.432078] ? io_schedule_timeout+0x80/0x80
[ 318.432095] ? lock_release+0xc9/0x2a0
[ 318.432119] schedule+0xd1/0x260
[ 318.432134] schedule_timeout+0x240/0x280
[ 318.432157] ? hrtimer_nanosleep_restart+0x340/0x340
[ 318.432182] ? mark_held_locks+0x49/0x70
[ 318.432203] ? lockdep_hardirqs_on_prepare+0xd7/0x170
[ 318.432224] ? _raw_spin_unlock_irq+0x23/0x40
[ 318.432245] ? trace_hardirqs_on+0x18/0x170
[ 318.432267] wait_for_completion+0x169/0x320
[ 318.432286] ? wait_for_completion_killable+0x410/0x410
[ 318.432304] ? mark_held_locks+0x49/0x70
[ 318.432324] ? lockdep_hardirqs_on_prepare+0xd7/0x170
[ 318.432348] ? _raw_spin_unlock_irqrestore+0x2c/0x50
[ 318.432372] ? trace_hardirqs_on+0x18/0x170
[ 318.432390] ? _raw_spin_unlock_irqrestore+0x2c/0x50
[ 318.432414] synchronize_rcu_normal+0x208/0x5f0
[ 318.432433] ? start_poll_synchronize_rcu_full+0x90/0x90
[ 318.432453] ? do_raw_spin_lock+0x123/0x290
[ 318.432481] ? _raw_spin_unlock_irqrestore+0x2c/0x50
[ 318.432504] ? rt_mutex_slowunlock+0x824/0xaf0
[ 318.432526] ? lock_is_held_type+0x9b/0x110
[ 318.432555] try_check_zero+0x429/0x630
[ 318.432583] process_srcu+0x4c1/0x16c0
[ 318.432607] ? lock_acquire+0x187/0x2e0
[ 318.432626] ? process_scheduled_works+0x4ca/0x1ac0
[ 318.432650] ? lock_release+0xc9/0x2a0
[ 318.432674] process_scheduled_works+0x553/0x1ac0
[ 318.432704] ? insert_work+0x190/0x190
[ 318.432723] ? write_comp_data+0x1f/0x70
[ 318.432741] ? __sanitizer_cov_trace_pc+0x1a/0x40
[ 318.432761] ? lock_is_held_type+0x9b/0x110
[ 318.432784] ? __sanitizer_cov_trace_pc+0x1a/0x40
[ 318.432805] worker_thread+0x5a9/0xd10
[ 318.432830] ? bh_worker+0x740/0x740
[ 318.435937] kthread+0x3f9/0x530
[ 318.435991] ? kthread_affine_node+0x2a0/0x2a0
[ 318.436060] ret_from_fork+0x666/0xab0
[ 318.436098] ? native_tss_update_io_bitmap+0x6c0/0x6c0
[ 318.436139] ? write_comp_data+0x1f/0x70
[ 318.436175] ? __sanitizer_cov_trace_pc+0x1a/0x40
[ 318.436213] ? __switch_to+0x854/0x1270
[ 318.436255] ? kthread_affine_node+0x2a0/0x2a0
[ 318.436301] ret_from_fork_asm+0x11/0x20
[ 318.436360] </TASK>
[ 318.436373] INFO: task kworker/0:1:11 blocked for more than 143 seconds.
[ 318.436393] Not tainted 7.0.0-rc1-00301-g4d349ee5c778 #1
[ 318.436411] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 318.436422] task:kworker/0:1 state:D stack:26592 pid:11
tgid:11 ppid:2 task_flags:0x4208060 flags:0x00080000
[ 318.436528] Workqueue: rcu_gp process_srcu
[ 318.436571] Call Trace:
[ 318.436580] <TASK>
[ 318.436595] __schedule+0x1190/0x4190
[ 318.436639] ? io_schedule_timeout+0x80/0x80
[ 318.436676] ? lock_release+0xc9/0x2a0
[ 318.436726] schedule+0xd1/0x260
[ 318.436759] schedule_timeout+0x240/0x280
[ 318.436804] ? hrtimer_nanosleep_restart+0x340/0x340
[ 318.522895] ? mark_held_locks+0x49/0x70
[ 318.522930] ? lockdep_hardirqs_on_prepare+0xd7/0x170
[ 318.522952] ? _raw_spin_unlock_irq+0x23/0x40
[ 318.522977] ? trace_hardirqs_on+0x18/0x170
[ 318.523008] wait_for_completion+0x169/0x320
[ 318.523029] ? wait_for_completion_killable+0x410/0x410
[ 318.523047] ? mark_held_locks+0x49/0x70
[ 318.523068] ? lockdep_hardirqs_on_prepare+0xd7/0x170
[ 318.523088] ? _raw_spin_unlock_irqrestore+0x2c/0x50
[ 318.523110] ? trace_hardirqs_on+0x18/0x170
[ 318.523128] ? _raw_spin_unlock_irqrestore+0x2c/0x50
[ 318.523152] synchronize_rcu_normal+0x208/0x5f0
[ 318.523171] ? start_poll_synchronize_rcu_full+0x90/0x90
[ 318.523190] ? do_raw_spin_lock+0x123/0x290
[ 318.523219] ? _raw_spin_unlock_irqrestore+0x2c/0x50
[ 318.523242] ? rt_mutex_slowunlock+0x824/0xaf0
[ 318.523264] ? lock_is_held_type+0x9b/0x110
[ 318.523294] try_check_zero+0x429/0x630
[ 318.523321] process_srcu+0x4c1/0x16c0
[ 318.523346] ? lock_acquire+0x187/0x2e0
[ 318.523365] ? process_scheduled_works+0x4ca/0x1ac0
[ 318.523389] ? lock_release+0xc9/0x2a0
[ 318.523412] process_scheduled_works+0x553/0x1ac0
[ 318.523442] ? insert_work+0x190/0x190
[ 318.523461] ? write_comp_data+0x1f/0x70
[ 318.523478] ? __sanitizer_cov_trace_pc+0x1a/0x40
[ 318.523498] ? lock_is_held_type+0x9b/0x110
[ 318.523521] ? __sanitizer_cov_trace_pc+0x1a/0x40
[ 318.523541] worker_thread+0x5a9/0xd10
[ 318.523566] ? bh_worker+0x740/0x740
[ 318.523587] kthread+0x3f9/0x530
[ 318.523605] ? kthread_affine_node+0x2a0/0x2a0
[ 318.523627] ret_from_fork+0x666/0xab0
[ 318.523645] ? native_tss_update_io_bitmap+0x6c0/0x6c0
[ 318.523664] ? write_comp_data+0x1f/0x70
[ 318.523680] ? __sanitizer_cov_trace_pc+0x1a/0x40
[ 318.523698] ? __switch_to+0x854/0x1270
[ 318.523717] ? kthread_affine_node+0x2a0/0x2a0
[ 318.523738] ret_from_fork_asm+0x11/0x20
[ 318.523764] </TASK>
[ 318.526244] INFO: task repro2:4086 blocked for more than 143 seconds.
[ 318.526258] Not tainted 7.0.0-rc1-00301-g4d349ee5c778 #1
[ 318.526267] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 318.526273] task:repro2 state:D stack:25512 pid:4086
tgid:4086 ppid:298 task_flags:0x400040 flags:0x00080002
executing program[ 318.526322] Call Trace:
[ 318.526327] <TASK>
[ 318.526334] __schedule+0x1190/0x4190
[ 318.526356] ? io_schedule_timeout+0x80/0x80
[ 318.526373] ? lock_release+0xc9/0x2a0
[ 318.526396] schedule+0xd1/0x260
[ 318.526411] schedule_timeout+0x240/0x280
[ 318.526432] ? hrtimer_nanosleep_restart+0x340/0x340
[ 318.526458] ? mark_held_locks+0x49/0x70
[ 318.526478] ? lockdep_hardirqs_on_prepare+0xd7/0x170
[ 318.526499] ? _raw_spin_unlock_irq+0x23/0x40
[ 318.526520] ? trace_hardirqs_on+0x18/0x170
[ 318.526539] wait_for_completion+0x169/0x320
[ 318.526556] ? __lock_acquire+0x55a/0x1ef0
[ 318.526577] ? wait_for_completion_killable+0x410/0x410
[ 318.526597] ? lockdep_init_map_type+0x4b/0x210
[ 318.526621] __synchronize_srcu+0x143/0x230
[ 318.526643] ? start_poll_synchronize_srcu+0x10/0x10
[ 318.526666] ? rcu_tasks_pregp_step+0x10/0x10
[ 318.526687] ? kvm_clock_read+0x3b/0x60
[ 318.526707] ? write_comp_data+0x1f/0x70
[ 318.526723] ? __sanitizer_cov_trace_pc+0x1a/0x40
[ 318.526741] ? ktime_get_mono_fast_ns+0x19d/0x2b0
[ 318.526766] ? synchronize_srcu+0x53/0x260
[ 318.526789] perf_trace_event_unreg.isra.0+0xb8/0x1e0
[ 318.526807] perf_trace_destroy+0xc3/0x1c0
[ 318.526822] ? perf_tp_event_init+0x120/0x120
[ 318.529171] __free_event+0x257/0xc10
[ 318.529200] ? perf_event_release_kernel+0x460/0x460
[ 318.529215] put_event+0x3c/0x90
[ 318.529230] perf_event_release_kernel+0x357/0x460
[ 318.529248] ? perf_event_release_kernel+0x460/0x460
[ 318.529263] perf_release+0x37/0x50
[ 318.529277] __fput+0x420/0xb80
[ 318.529303] fput_close_sync+0x10f/0x230
[ 318.529325] ? alloc_file_clone+0x110/0x110
[ 318.529345] ? dnotify_flush+0x7f/0x4c0
[ 318.529372] __x64_sys_close+0x8f/0x120
[ 318.529390] do_syscall_64+0x115/0x690
[ 318.529410] entry_SYSCALL_64_after_hwframe+0x4b/0x53
[ 318.529425] RIP: 0033:0x7fdfffcd611b
[ 318.529436] RSP: 002b:00007fffb7d290d0 EFLAGS: 00000293 ORIG_RAX:
0000000000000003
[ 318.529450] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fdfffcd611b
[ 318.529460] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005
[ 318.529469] RBP: 00007fffb7d29100 R08: 0000000000000000 R09: 0000000000000032
[ 318.529478] R10: 0000000000000000 R11: 0000000000000293 R12: 000055b24d16e3f0
[ 318.529486] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 318.529503] </TASK>
Could you please advise whether this is a known issue/regression, and
what additional debug data would be most helpful? I can rerun with
extra debugging options if needed.
Thanks,
Zw Tang