[Syzkaller & bisect] There is "task hung in synchronize_rcu bisect" in v6.3-rc2 kernel

From: Pengfei Xu
Date: Wed Mar 15 2023 - 08:20:10 EST


Hi Li Huafei and kernel experts,

Greeting!

Platform: x86 platforms
There is "task hung in synchronize_rcu bisect" in v6.3-rc2 kernel in guest:

All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230315_023443_synchronize_rcu
Reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/repro.c
Kconfig: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/kconfig_origin
Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/bisect_info.log
v6.3-rc2 issue dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/eeac8ede17557680855031c6f305ece2378af326_dmesg.log
"
[ 24.844767] memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=330 'systemd'
[ 31.392668] hrtimer: interrupt took 10726570 ns
[ 62.552360] rcu_tasks_wait_gp: rcu_tasks grace period number 9 (since boot) is 2942 jiffies old.
[ 92.760386] rcu_tasks_wait_gp: rcu_tasks grace period number 9 (since boot) is 10494 jiffies old.
[ 182.808389] rcu_tasks_wait_gp: rcu_tasks grace period number 9 (since boot) is 33006 jiffies old.
[ 300.120396] INFO: task repro:540 blocked for more than 147 seconds.
[ 300.120937] Not tainted 6.3.0-rc2-eeac8ede1755+ #1
[ 300.121351] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 300.121888] task:repro state:D stack:0 pid:540 ppid:538 flags:0x00004006
[ 300.122552] Call Trace:
[ 300.122835] <TASK>
[ 300.123267] __schedule+0x40a/0xc90
[ 300.124351] ? wait_for_completion+0x7b/0x180
[ 300.124772] schedule+0x5b/0xe0
[ 300.125156] schedule_timeout+0x50b/0x670
[ 300.125829] ? schedule_timeout+0x9/0x670
[ 300.126414] ? wait_for_completion+0x7b/0x180
[ 300.126826] wait_for_completion+0xa6/0x180
[ 300.127478] __wait_rcu_gp+0x136/0x160
[ 300.127922] ? __pfx_arch_ftrace_ops_list_func+0x10/0x10
[ 300.128639] synchronize_rcu_tasks_generic.part.24+0x3b/0x60
[ 300.129092] ? __pfx_call_rcu_tasks+0x10/0x10
[ 300.129499] ? __pfx_wakeme_after_rcu+0x10/0x10
[ 300.130115] ? verify_cpu+0x10/0x100
[ 300.130574] synchronize_rcu_tasks_generic+0x24/0x70
[ 300.131020] synchronize_rcu_tasks+0x19/0x20
[ 300.131438] ftrace_shutdown+0x1cc/0x410
[ 300.132206] unregister_ftrace_function+0x35/0x230
[ 300.132722] ? __sanitizer_cov_trace_switch+0x57/0xa0
[ 300.133311] perf_ftrace_event_register+0x95/0xf0
[ 300.133750] ? __pfx_tp_perf_event_destroy+0x10/0x10
[ 300.134190] perf_trace_destroy+0x3a/0xa0
[ 300.134590] ? __pfx_tp_perf_event_destroy+0x10/0x10
[ 300.135028] tp_perf_event_destroy+0x1e/0x30
[ 300.135473] _free_event+0x101/0x810
[ 300.136191] put_event+0x3c/0x50
[ 300.136605] perf_event_release_kernel+0x2de/0x360
[ 300.137011] ? perf_event_release_kernel+0x9/0x360
[ 300.137577] ? __pfx_perf_release+0x10/0x10
[ 300.138031] perf_release+0x22/0x30
[ 300.138434] __fput+0x11f/0x450
[ 300.139038] ____fput+0x1e/0x30
[ 300.139425] task_work_run+0xb6/0x120
[ 300.140164] do_exit+0x547/0x1360
[ 300.140632] ? write_comp_data+0x2f/0x90
[ 300.141223] do_group_exit+0x5e/0xf0
[ 300.141732] get_signal+0x15d1/0x1600
[ 300.142596] arch_do_signal_or_restart+0x33/0x280
[ 300.143463] exit_to_user_mode_prepare+0x13b/0x210
[ 300.143955] syscall_exit_to_user_mode+0x2d/0x60
[ 300.144623] do_syscall_64+0x4a/0x90
[ 300.145032] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 300.145459] RIP: 0033:0x7f2a7abd859d
[ 300.145823] RSP: 002b:00007ffcc732b338 EFLAGS: 00000246 ORIG_RAX: 000000000000012a
[ 300.146429] RAX: 0000000000000003 RBX: 0000000000000000 RCX: 00007f2a7abd859d
[ 300.146930] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 00000000200004c0
[ 300.147436] RBP: 00007ffcc732b350 R08: 0000000000000000 R09: 00007ffcc732b350
[ 300.147938] R10: 00000000ffffffff R11: 0000000000000246 R12: 00000000004011e0
[ 300.148645] R13: 00007ffcc732b470 R14: 0000000000000000 R15: 0000000000000000
[ 300.149713] </TASK>
[ 300.149978]
[ 300.149978] Showing all locks held in the system:
[ 300.150435] 1 lock held by rcu_tasks_kthre/11:
[ 300.150808] #0: ffffffff83f63490 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x420
[ 300.151849] 1 lock held by rcu_tasks_rude_/12:
[ 300.152593] #0: ffffffff83f63210 (rcu_tasks_rude.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x420
[ 300.153645] 1 lock held by rcu_tasks_trace/13:
[ 300.154013] #0: ffffffff83f62f50 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x420
[ 300.155078] 1 lock held by khungtaskd/29:
[ 300.155425] #0: ffffffff83f63ea0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x1b/0x1e0
[ 300.156644] 2 locks held by repro/540:
[ 300.156977] #0: ffffffff840206a8 (event_mutex){+.+.}-{3:3}, at: perf_trace_destroy+0x21/0xa0
[ 300.157971] #1: ffffffff8401cd68 (ftrace_lock){+.+.}-{3:3}, at: unregister_ftrace_function+0x2b/0x230
[ 300.159005]
[ 300.159233] =============================================
[ 300.159233]
"
Bisected and found first bad commit:
"
0e792b89e6800cd9cb4757a76a96f7ef3e8b6294
ftrace: Fix use-after-free for dynamic ftrace_ops
"
After reverted the above commit on top of v6.3-rc2 kernel, this issue was gone.

And this issue could be reproduced in 2100s.

There was no this issue report in syzbot link:
https://syzkaller.appspot.com/upstream

I hope it's helpful.

Thanks!

---

If you don't need the following environment to reproduce the problem or if you
already have one, please ignore the following information.

How to reproduce:
git clone https://gitlab.com/xupengfe/repro_vm_env.git
cd repro_vm_env
tar -xvf repro_vm_env.tar.gz
cd repro_vm_env; ./start3.sh // it needs qemu-system-x86_64 and I used v7.1.0
// start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
// You could change the bzImage_xxx as you want
You could use below command to log in, there is no password for root.
ssh -p 10023 root@localhost

After login vm(virtual machine) successfully, you could transfer reproduced
binary to the vm by below way, and reproduce the problem in vm:
gcc -pthread -o repro repro.c
scp -P 10023 repro root@localhost:/root/

Get the bzImage for target kernel:
Please use target kconfig and copy it to kernel_src/.config
make olddefconfig
make -jx bzImage //x should equal or less than cpu num your pc has

Fill the bzImage file into above start3.sh to load the target kernel in vm.


Tips:
If you already have qemu-system-x86_64, please ignore below info.
If you want to install qemu v7.1.0 version:
git clone https://github.com/qemu/qemu.git
cd qemu
git checkout -f v7.1.0
mkdir build
cd build
yum install -y ninja-build.x86_64
../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl
make
make install

Thanks!
BR.