[Syzkaller & bisect] There is task hung in acct_process in v6.10 and v6.11-rc1
From: Pengfei Xu
Date: Mon Jul 29 2024 - 03:07:01 EST
Hi Azat Khuzhin,
Greetings!
There is "task hung in acct_process" in v6.10 and v6.11-rc1:
Found first bad commit:
c9e4bf607d8c PM: hibernate: Fix writing maj:min to /sys/power/resume
Revert this commit and the issue could not be reproduced.
All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/240728_080131_acct_process
Syzkaller repro code: https://github.com/xupengfe/syzkaller_logs/blob/main/240728_080131_acct_process/repro.c
Syzkaller syscall repro steps: https://github.com/xupengfe/syzkaller_logs/blob/main/240728_080131_acct_process/repro.prog
Syzkaller analysis report: https://github.com/xupengfe/syzkaller_logs/blob/main/240728_080131_acct_process/repro.report
Kconfig(make olddefconfig): https://github.com/xupengfe/syzkaller_logs/blob/main/240728_080131_acct_process/kconfig_origin
Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/240728_080131_acct_process/bisect_info.log
Issue dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/240728_080131_acct_process/0c3836482481200ead7b416ca80c68a29cfdaabd_dmesg.log
v6.10 bzImage: https://github.com/xupengfe/syzkaller_logs/raw/main/240728_080131_acct_process/bzImage_0c3836482481200ead7b416ca80c68a29cfdaabd.tar.gz
This issue could be reproduced in 600s in guest.
"
[ 300.358849] INFO: task systemd-hostnam:283 blocked for more than 147 seconds.
[ 300.358511] Not tainted 6.10.0-0c3836482481 #1
[ 300.358849] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 300.359357] task:systemd-hostnam state:D stack:0 pid:283 tgid:283 ppid:1 flags:0x00000002
[ 300.359958] Call Trace:
[ 300.360137] <TASK>
[ 300.360293] __schedule+0xba3/0x3120
[ 300.360589] ? __pfx___schedule+0x10/0x10
[ 300.360866] ? lock_release+0x418/0x840
[ 300.361124] ? __pfx_lock_release+0x10/0x10
[ 300.361401] ? __pfx_lock_release+0x10/0x10
[ 300.361668] ? lock_is_held_type+0xef/0x150
[ 300.361978] schedule+0xf6/0x3f0
[ 300.362201] schedule_preempt_disabled+0x1c/0x30
[ 300.362501] __mutex_lock+0xc32/0x1660
[ 300.362751] ? acct_process+0x1ef/0x670
[ 300.363011] ? __pfx___mutex_lock+0x10/0x10
[ 300.363278] ? __this_cpu_preempt_check+0x21/0x30
[ 300.363581] ? lock_release+0x418/0x840
[ 300.363849] mutex_lock_nested+0x1f/0x30
[ 300.364101] ? mutex_lock_nested+0x1f/0x30
[ 300.364368] acct_process+0x1ef/0x670
[ 300.364607] ? acct_process+0xa1/0x670
[ 300.364867] do_exit+0x198d/0x29f0
[ 300.365095] ? lock_release+0x418/0x840
[ 300.365351] ? __pfx_do_exit+0x10/0x10
[ 300.365599] ? __this_cpu_preempt_check+0x21/0x30
[ 300.365919] ? _raw_spin_unlock_irq+0x2c/0x60
[ 300.366205] ? lockdep_hardirqs_on+0x89/0x110
[ 300.366497] do_group_exit+0xe4/0x2c0
[ 300.366738] __x64_sys_exit_group+0x4d/0x60
[ 300.367006] x64_sys_call+0x1a1f/0x20d0
[ 300.367259] do_syscall_64+0x6d/0x140
[ 300.367505] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 300.367827] RIP: 0033:0x7f8ee7118a4d
[ 300.368059] RSP: 002b:00007fffeace3298 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[ 300.368523] RAX: ffffffffffffffda RBX: 00007f8ee71f69e0 RCX: 00007f8ee7118a4d
[ 300.368955] RDX: 00000000000000e7 RSI: fffffffffffffe88 RDI: 0000000000000000
[ 300.369386] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007f8ee7201b20
[ 300.369828] R10: 00007fffeace2fa0 R11: 0000000000000246 R12: 00007f8ee71f69e0
[ 300.370265] R13: 00007f8ee71fbf00 R14: 0000000000000002 R15: 00007f8ee71fbee8
[ 300.370721] </TASK>
[ 300.370867] INFO: task gmain:291 blocked for more than 147 seconds.
[ 300.371257] Not tainted 6.10.0-0c3836482481 #1
[ 300.371566] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 300.372040] task:gmain state:D stack:0 pid:291 tgid:290 ppid:1 flags:0x00004002
[ 300.372617] Call Trace:
[ 300.372778] <TASK>
[ 300.372922] __schedule+0xba3/0x3120
[ 300.373167] ? __pfx___schedule+0x10/0x10
[ 300.373420] ? lock_release+0x418/0x840
[ 300.373676] ? __pfx_lock_release+0x10/0x10
[ 300.373954] ? __pfx_lock_release+0x10/0x10
[ 300.374226] ? lock_is_held_type+0xef/0x150
[ 300.374499] schedule+0xf6/0x3f0
[ 300.374724] schedule_preempt_disabled+0x1c/0x30
[ 300.375022] __mutex_lock+0xc32/0x1660
[ 300.375271] ? acct_process+0x1ef/0x670
[ 300.375528] ? __pfx___mutex_lock+0x10/0x10
[ 300.375800] ? __this_cpu_preempt_check+0x21/0x30
[ 300.376105] ? lock_release+0x418/0x840
[ 300.376369] mutex_lock_nested+0x1f/0x30
[ 300.376627] ? mutex_lock_nested+0x1f/0x30
[ 300.376893] acct_process+0x1ef/0x670
[ 300.377131] ? acct_process+0xa1/0x670
[ 300.377380] do_exit+0x198d/0x29f0
[ 300.377607] ? __this_cpu_preempt_check+0x21/0x30
[ 300.377920] ? lock_release+0x418/0x840
[ 300.378168] ? __pfx_do_exit+0x10/0x10
[ 300.378414] do_group_exit+0xe4/0x2c0
[ 300.378654] get_signal+0x2387/0x2460
[ 300.378907] ? __pfx_get_signal+0x10/0x10
[ 300.379175] arch_do_signal_or_restart+0x8e/0x7d0
[ 300.379470] ? seqcount_lockdep_reader_access.constprop.0+0xb4/0xd0
[ 300.379874] ? __pfx_arch_do_signal_or_restart+0x10/0x10
[ 300.380207] ? trace_hardirqs_on+0x51/0x60
[ 300.380480] ? __this_cpu_preempt_check+0x21/0x30
[ 300.380780] ? syscall_exit_to_user_mode+0x109/0x1f0
[ 300.381098] syscall_exit_to_user_mode+0x13e/0x1f0
[ 300.381405] do_syscall_64+0x79/0x140
[ 300.381644] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 300.381973] RIP: 0033:0x7fba2bf4296f
[ 300.382210] RSP: 002b:00007fba2b5ff1c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000007
[ 300.382683] RAX: fffffffffffffdfc RBX: 00007fba2c2ce071 RCX: 00007fba2bf4296f
[ 300.383128] RDX: 00000000ffffffff RSI: 0000000000000001 RDI: 000055c951da3b10
[ 300.383577] RBP: 000055c951da3b10 R08: 0000000000000000 R09: 0000000000000000
[ 300.384022] R10: 00007fba2c70b080 R11: 0000000000000293 R12: 0000000000000001
[ 300.384462] R13: 0000000000000001 R14: 00007fba2b5ff230 R15: 000055c951dbd270
[ 300.384918] </TASK>
[ 300.385071] INFO: task gmain:350 blocked for more than 147 seconds.
[ 300.385460] Not tainted 6.10.0-0c3836482481 #1
[ 300.385780] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables thi
"
I hope it's helpful.
Thanks!
---
If you don't need the following environment to reproduce the problem or if you
already have one reproduced environment, please ignore the following information.
How to reproduce:
git clone https://gitlab.com/xupengfe/repro_vm_env.git
cd repro_vm_env
tar -xvf repro_vm_env.tar.gz
cd repro_vm_env; ./start3.sh // it needs qemu-system-x86_64 and I used v7.1.0
// start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
// You could change the bzImage_xxx as you want
// Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version
You could use below command to log in, there is no password for root.
ssh -p 10023 root@localhost
After login vm(virtual machine) successfully, you could transfer reproduced
binary to the vm by below way, and reproduce the problem in vm:
gcc -pthread -o repro repro.c
scp -P 10023 repro root@localhost:/root/
Get the bzImage for target kernel:
Please use target kconfig and copy it to kernel_src/.config
make olddefconfig
make -jx bzImage //x should equal or less than cpu num your pc has
Fill the bzImage file into above start3.sh to load the target kernel in vm.
Tips:
If you already have qemu-system-x86_64, please ignore below info.
If you want to install qemu v7.1.0 version:
git clone https://github.com/qemu/qemu.git
cd qemu
git checkout -f v7.1.0
mkdir build
cd build
yum install -y ninja-build.x86_64
yum -y install libslirp-devel.x86_64
../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp
make
make install
Best Regards,
Thanks!