[Syzkaller & bisect] There is possible deadlock in __bpf_ringbuf_reserve
From: Lai, Yi
Date: Thu Oct 10 2024 - 02:56:10 EST
linux-kernel@xxxxxxxxxxxxxxx,syzkaller-bugs@xxxxxxxxxxxxxxxx,yi1.lai@xxxxxxxxx
Bcc:
Subject: [Syzkaller & bisect] There is possible deadlock in
__bpf_ringbuf_reserve in linux next
Reply-To:
Hi Matthew,
Greetings!
I used Syzkaller and found that there is possible deadlock in __bpf_ringbuf_reserve in Linux-next tree
After bisection and the first bad commit is:
"
cb995f4eeba9 filemap: Handle sibling entries in filemap_get_read_batch()
"
All detailed into can be found at:
https://github.com/laifryiee/syzkaller_logs/tree/main/241001_120142___bpf_ringbuf_reserve
Syzkaller repro code:
https://github.com/laifryiee/syzkaller_logs/tree/main/241001_120142___bpf_ringbuf_reserve/repro.c
Syzkaller repro syscall steps:
https://github.com/laifryiee/syzkaller_logs/tree/main/241001_120142___bpf_ringbuf_reserve/repro.prog
Syzkaller report:
https://github.com/laifryiee/syzkaller_logs/tree/main/241001_120142___bpf_ringbuf_reserve/repro.report
Kconfig(make olddefconfig):
https://github.com/laifryiee/syzkaller_logs/tree/main/241001_120142___bpf_ringbuf_reserve/kconfig_origin
Bisect info:
https://github.com/laifryiee/syzkaller_logs/tree/main/241001_120142___bpf_ringbuf_reserve/bisect_info.log
bzImage:
https://github.com/laifryiee/syzkaller_logs/raw/refs/heads/main/241001_120142___bpf_ringbuf_reserve/bzImage_9852d85ec9d492ebef56dc5f229416c925758edc
Issue dmesg:
https://github.com/laifryiee/syzkaller_logs/blob/main/241001_120142___bpf_ringbuf_reserve/9852d85ec9d492ebef56dc5f229416c925758edc_dmesg.log
"
[ 17.246785] 6.12.0-rc1-9852d85ec9d4 #1 Not tainted
[ 17.247068] --------------------------------------------
[ 17.247361] repro/753 is trying to acquire lock:
[ 17.247639] ffffc9000135c0d8 (&rb->spinlock){-.-.}-{2:2}, at: __bpf_ringbuf_reserve+0x386/0x460
[ 17.248186]
[ 17.248186] but task is already holding lock:
[ 17.248531] ffffc900014fa0d8 (&rb->spinlock){-.-.}-{2:2}, at: __bpf_ringbuf_reserve+0x386/0x460
[ 17.249053]
[ 17.249053] other info that might help us debug this:
[ 17.249435] Possible unsafe locking scenario:
[ 17.249435]
[ 17.249783] CPU0
[ 17.249936] ----
[ 17.250090] lock(&rb->spinlock);
[ 17.250303] lock(&rb->spinlock);
[ 17.250516]
[ 17.250516] *** DEADLOCK ***
[ 17.250516]
[ 17.250866] May be due to missing lock nesting notation
[ 17.250866]
[ 17.251267] 5 locks held by repro/753:
[ 17.251496] #0: ffffffff87172f68 (tracepoints_mutex){+.+.}-{3:3}, at: tracepoint_probe_unregister+0x39/0xc70
[ 17.252099] #1: ffffffff87172f20 (tracepoints_mutex.wait_lock){+.+.}-{2:2}, at: __mutex_lock+0x100e/0x1490
[ 17.252697] #2: ffffffff8705c9c0 (rcu_read_lock){....}-{1:2}, at: bpf_trace_run2+0x1b7/0x5a0
[ 17.253217] #3: ffffc900014fa0d8 (&rb->spinlock){-.-.}-{2:2}, at: __bpf_ringbuf_reserve+0x386/0x460
[ 17.253766] #4: ffffffff8705c9c0 (rcu_read_lock){....}-{1:2}, at: bpf_trace_run2+0x1b7/0x5a0
[ 17.254284]
[ 17.254284] stack backtrace:
[ 17.254551] CPU: 0 UID: 0 PID: 753 Comm: repro Not tainted 6.12.0-rc1-9852d85ec9d4 #1
[ 17.255020] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 17.255686] Call Trace:
[ 17.255844] <TASK>
[ 17.255980] dump_stack_lvl+0xea/0x150
[ 17.256223] dump_stack+0x19/0x20
[ 17.256434] print_deadlock_bug+0x3c5/0x680
[ 17.256700] __lock_acquire+0x2a85/0x5c90
[ 17.256954] ? __pfx___lock_acquire+0x10/0x10
[ 17.257224] ? __kasan_check_read+0x15/0x20
[ 17.257486] ? __lock_acquire+0xd87/0x5c90
[ 17.257741] ? __pfx_mark_lock.part.0+0x10/0x10
[ 17.258023] lock_acquire.part.0+0x142/0x390
[ 17.258289] ? __bpf_ringbuf_reserve+0x386/0x460
[ 17.258574] ? __pfx_lock_acquire.part.0+0x10/0x10
[ 17.258868] ? __lock_acquire+0xd87/0x5c90
[ 17.259131] ? debug_smp_processor_id+0x20/0x30
[ 17.259414] ? rcu_is_watching+0x19/0xc0
[ 17.259669] ? trace_lock_acquire+0x139/0x1b0
[ 17.259956] lock_acquire+0x80/0xb0
[ 17.260178] ? __bpf_ringbuf_reserve+0x386/0x460
[ 17.260472] _raw_spin_lock_irqsave+0x52/0x80
[ 17.260754] ? __bpf_ringbuf_reserve+0x386/0x460
[ 17.261050] __bpf_ringbuf_reserve+0x386/0x460
[ 17.261339] ? trace_lock_acquire+0x139/0x1b0
[ 17.261629] bpf_ringbuf_reserve+0x63/0xa0
[ 17.261898] bpf_prog_9efe54833449f08e+0x2e/0x48
[ 17.262200] bpf_trace_run2+0x238/0x5a0
[ 17.262454] ? __pfx_bpf_trace_run2+0x10/0x10
[ 17.262734] ? __lock_acquire+0x1b0f/0x5c90
[ 17.263016] ? pndisc_destructor+0x1c0/0x250
[ 17.263301] ? __pfx___bpf_trace_contention_end+0x10/0x10
[ 17.263628] __bpf_trace_contention_end+0xf/0x20
[ 17.263914] __traceiter_contention_end+0x66/0xb0
[ 17.264204] trace_contention_end.constprop.0+0xdc/0x140
[ 17.264534] __pv_queued_spin_lock_slowpath+0x29a/0xc80
[ 17.264881] ? __pfx___pv_queued_spin_lock_slowpath+0x10/0x10
[ 17.265247] do_raw_spin_lock+0x1fb/0x280
[ 17.265506] ? __pfx_do_raw_spin_lock+0x10/0x10
[ 17.265792] ? lock_acquire+0x80/0xb0
[ 17.266029] ? __bpf_ringbuf_reserve+0x386/0x460
[ 17.266329] _raw_spin_lock_irqsave+0x5a/0x80
[ 17.266604] ? __bpf_ringbuf_reserve+0x386/0x460
[ 17.266891] __bpf_ringbuf_reserve+0x386/0x460
[ 17.267180] ? trace_lock_acquire+0x139/0x1b0
[ 17.267463] bpf_ringbuf_reserve+0x63/0xa0
[ 17.267720] bpf_prog_9efe54833449f08e+0x2e/0x48
[ 17.268003] bpf_trace_run2+0x238/0x5a0
[ 17.268244] ? __pfx_bpf_trace_run2+0x10/0x10
[ 17.268521] ? __kasan_check_write+0x18/0x20
[ 17.268784] ? do_raw_spin_lock+0x141/0x280
[ 17.269046] ? __pfx___bpf_trace_contention_end+0x10/0x10
[ 17.269375] __bpf_trace_contention_end+0xf/0x20
[ 17.269661] __traceiter_contention_end+0x66/0xb0
[ 17.269952] trace_contention_end+0xc5/0x120
[ 17.270218] ? __mutex_lock+0x1035/0x1490
[ 17.270468] __mutex_lock+0x6bd/0x1490
[ 17.270705] ? tracepoint_probe_unregister+0x39/0xc70
[ 17.271021] ? __pfx___mutex_lock+0x10/0x10
[ 17.271281] ? delete_node+0x219/0x750
[ 17.271529] ? __pfx_bpf_link_release+0x10/0x10
[ 17.271816] mutex_lock_nested+0x1f/0x30
[ 17.272062] ? mutex_lock_nested+0x1f/0x30
[ 17.272318] tracepoint_probe_unregister+0x39/0xc70
[ 17.272618] ? __pfx_bpf_link_release+0x10/0x10
[ 17.272900] ? __pfx___bpf_trace_contention_end+0x10/0x10
[ 17.273228] ? __pfx_bpf_link_release+0x10/0x10
[ 17.273510] bpf_probe_unregister+0x5b/0x90
[ 17.273773] bpf_raw_tp_link_release+0x3f/0x80
[ 17.274050] bpf_link_free+0x139/0x2d0
[ 17.274288] bpf_link_release+0x68/0x80
[ 17.274530] __fput+0x414/0xb60
[ 17.274737] ____fput+0x22/0x30
[ 17.274938] task_work_run+0x19c/0x2b0
[ 17.275185] ? __pfx_task_work_run+0x10/0x10
"
I hope you find it useful.
Regards,
Yi Lai
---
If you don't need the following environment to reproduce the problem or if you
already have one reproduced environment, please ignore the following information.
How to reproduce:
git clone https://gitlab.com/xupengfe/repro_vm_env.git
cd repro_vm_env
tar -xvf repro_vm_env.tar.gz
cd repro_vm_env; ./start3.sh // it needs qemu-system-x86_64 and I used v7.1.0
// start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
// You could change the bzImage_xxx as you want
// Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version
You could use below command to log in, there is no password for root.
ssh -p 10023 root@localhost
After login vm(virtual machine) successfully, you could transfer reproduced
binary to the vm by below way, and reproduce the problem in vm:
gcc -pthread -o repro repro.c
scp -P 10023 repro root@localhost:/root/
Get the bzImage for target kernel:
Please use target kconfig and copy it to kernel_src/.config
make olddefconfig
make -jx bzImage //x should equal or less than cpu num your pc has
Fill the bzImage file into above start3.sh to load the target kernel in vm.
Tips:
If you already have qemu-system-x86_64, please ignore below info.
If you want to install qemu v7.1.0 version:
git clone https://github.com/qemu/qemu.git
cd qemu
git checkout -f v7.1.0
mkdir build
cd build
yum install -y ninja-build.x86_64
yum -y install libslirp-devel.x86_64
../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp
make
make install