[Syzkaller & bisect] There is WARNING: suspicious RCU usage in mas_walk in v6.3-rc1

From: Pengfei Xu
Date: Tue Mar 07 2023 - 10:16:20 EST


Hi Matthew Wilcox,

Platform: x86 platforms

There is WARNING: suspicious RCU usage in mas_walk in v6.3-rc1.

All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230306_221524_mas_walk
Reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230306_221524_mas_walk/repro.c
v6.3-rc1 problem dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/230306_221524_mas_walk/repro.c
Kconfig: https://github.com/xupengfe/syzkaller_logs/blob/main/230306_221524_mas_walk/kconfig_origin
Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230306_221524_mas_walk/bisect_info.log

"
[ 62.854989] memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=337 'systemd'

[ 87.245186] =============================
[ 87.245689] WARNING: suspicious RCU usage
[ 87.246234] 6.3.0-rc1-fe15c26ee26e+ #1 Not tainted
[ 87.246824] -----------------------------
[ 87.247319] lib/maple_tree.c:856 suspicious rcu_dereference_check() usage!
[ 87.248149]
other info that might help us debug this:

[ 87.249112]
rcu_scheduler_active = 2, debug_locks = 1
[ 87.249983] 5 locks held by repro/3075:
[ 87.250463] #0: ffff88800a984448 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x9f/0x170
[ 87.251461] #1: ffff88800de3ca88 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x190/0x290
[ 87.252549] #2: ffff888007bce938 (kn->active#79){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x199/0x290
[ 87.253674] #3: ffffffff840425a8 (ksm_thread_mutex){+.+.}-{3:3}, at: run_store+0x88/0x4d0
[ 87.254718] #4: ffff88800dc61b18 (&mm->mmap_lock){++++}-{3:3}, at: run_store+0x1b9/0x4d0
[ 87.255722]
stack backtrace:
[ 87.256259] CPU: 1 PID: 3075 Comm: repro Not tainted 6.3.0-rc1-fe15c26ee26e+ #1
[ 87.257150] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 87.258514] Call Trace:
[ 87.258828] <TASK>
[ 87.259105] dump_stack_lvl+0xe0/0x110
[ 87.259598] dump_stack+0x19/0x20
[ 87.260023] lockdep_rcu_suspicious+0x122/0x1a0
[ 87.260598] mas_walk+0x27a/0x340
[ 87.261038] mas_find+0xe2/0x140
[ 87.261460] run_store+0x1d3/0x4d0
[ 87.261905] ? __pfx_run_store+0x10/0x10
[ 87.262404] kobj_attr_store+0x3f/0x70
[ 87.262886] sysfs_kf_write+0x69/0x90
[ 87.263356] ? __pfx_sysfs_kf_write+0x10/0x10
[ 87.263908] kernfs_fop_write_iter+0x1ce/0x290
[ 87.264473] vfs_write+0x577/0x7c0
[ 87.264927] ksys_write+0x9f/0x170
[ 87.265371] __x64_sys_write+0x27/0x30
[ 87.265851] do_syscall_64+0x3b/0x90
[ 87.266309] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 87.266935] RIP: 0033:0x7f1eae54759d
[ 87.267389] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 8
[ 87.269603] RSP: 002b:00007ffd266b4618 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[ 87.270519] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1eae54759d
[ 87.271378] RDX: 0000000000000002 RSI: 0000000020000140 RDI: 0000000000000003
[ 87.272229] RBP: 00007ffd266b4630 R08: 00007ffd266b4630 R09: 00007ffd266b4630
[ 87.273084] R10: 00007ffd266b4630 R11: 0000000000000202 R12: 0000000000401190
[ 87.273935] R13: 00007ffd266b4750 R14: 0000000000000000 R15: 0000000000000000
[ 87.274802] </TASK>
"
Found this issue existed in v6.2 kernel also.
And bisected between v6.2 and v5.11 and found first bad commit:
"
a5f18ba0727656bd1fe3bcdb0d563f81790f9a04
mm/ksm: use vma iterators instead of vma linked list
"

And it's suspected problem commit, because reverted the bad commit on top
of v6.2 failed and could not double confirm.


syzbot dashboard link found the similar issue but didn't provide the bad commit.
https://syzkaller.appspot.com/bug?id=64a3e95957cd3deab99df7cd7b5a9475af92c93e


And I hope it's useful.

---

If you don't need the following environment to reproduce the problem or if you
already have one, please ignore the following information.

How to reproduce:
git clone https://gitlab.com/xupengfe/repro_vm_env.git
cd repro_vm_env
tar -xvf repro_vm_env.tar.gz
cd repro_vm_env; ./start3.sh // it needs qemu-system-x86_64 and I used v7.1.0
// start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
// You could change the bzImage_xxx as you want
You could use below command to log in, there is no password for root.
ssh -p 10023 root@localhost

After login vm(virtual machine) successfully, you could transfer reproduced
binary to the vm by below way, and reproduce the problem in vm:
gcc -pthread -o repro repro.c
scp -P 10023 repro root@localhost:/root/

Get the bzImage for target kernel:
Please use target kconfig and copy it to kernel_src/.config
make olddefconfig
make -jx bzImage //x should equal or less than cpu num your pc has

Fill the bzImage file into above start3.sh to load the target kernel vm.


Tips:
If you already have qemu-system-x86_64, please ignore below info.
If you want to install qemu v7.1.0 version:
git clone https://github.com/qemu/qemu.git
cd qemu
git checkout -f v7.1.0
mkdir build
cd build
yum install -y ninja-build.x86_64
../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl
make
make install

---

Thanks!
BR.
-Pengfei (Intel)