Re: [syzbot] [io-uring?] BUG: unable to handle kernel NULL pointer dereference in percpu_ref_put_many

From: Jens Axboe
Date: Mon Dec 23 2024 - 15:34:08 EST


On 12/23/24 12:52 PM, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: eabcdba3ad40 Merge tag 'for-6.13-rc3-tag' of git://git.ker..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=10871f44580000
> kernel config: https://syzkaller.appspot.com/x/.config?x=c22efbd20f8da769
> dashboard link: https://syzkaller.appspot.com/bug?extid=3dcac84cc1d50f43ed31
> compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=141bccf8580000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=135f7730580000

I ran this one but his this instead:

==================================================================
BUG: KASAN: slab-out-of-bounds in nvmet_root_discovery_nqn_store+0x110/0x180
Write of size 256 at addr ffff000009e71180 by task refcrash/775

CPU: 0 UID: 0 PID: 775 Comm: refcrash Not tainted 6.13.0-rc4 #2
Hardware name: linux,dummy-virt (DT)
Call trace:
show_stack+0x1c/0x30 (C)
__dump_stack+0x24/0x30
dump_stack_lvl+0x60/0x80
print_address_description+0x88/0x220
print_report+0x4c/0x60
kasan_report+0x94/0xf0
kasan_check_range+0x248/0x288
__asan_memset+0x30/0x60
nvmet_root_discovery_nqn_store+0x110/0x180
configfs_write_iter+0x220/0x2e8
do_iter_readv_writev+0x2e0/0x458
vfs_writev+0x220/0x728
do_writev+0xf8/0x1a8
__arm64_sys_writev+0x80/0x98
invoke_syscall+0x7c/0x258
el0_svc_common+0x108/0x1d0
do_el0_svc+0x4c/0x60
el0_svc+0x4c/0xa0
el0t_64_sync_handler+0x70/0x100
el0t_64_sync+0x170/0x178

Allocated by task 1:
kasan_save_track+0x2c/0x60
kasan_save_alloc_info+0x3c/0x48
__kasan_kmalloc+0x80/0x98
__kmalloc_node_track_caller_noprof+0x2f0/0x590
kstrndup+0x4c/0xb8
nvmet_subsys_alloc+0x1c4/0x498
nvmet_init_discovery+0x20/0x48
nvmet_init+0x18c/0x1c0
do_one_initcall+0x1a4/0x718
do_initcall_level+0x178/0x348
do_initcalls+0x58/0xa0
do_basic_setup+0x7c/0x98
kernel_init_freeable+0x268/0x380
kernel_init+0x24/0x148
ret_from_fork+0x10/0x20

The buggy address belongs to the object at ffff000009e71180
which belongs to the cache kmalloc-64 of size 64
The buggy address is located 0 bytes inside of
allocated 37-byte region [ffff000009e71180, ffff000009e711a5)

The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x49e71
anon flags: 0x3ffe00000000000(node=0|zone=0|lastcpupid=0x1fff)
page_type: f5(slab)
raw: 03ffe00000000000 ffff0000070028c0 fffffdffc0523d80 dead000000000005
raw: 0000000000000000 0000000000200020 00000001f5000000 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff000009e71080: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
ffff000009e71100: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc
>ffff000009e71180: 00 00 00 00 05 fc fc fc fc fc fc fc fc fc fc fc
Zero length message leads to an empty skb
^
ffff000009e71200: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
ffff000009e71280: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
==================================================================
Disabling lock debugging due to kernel taint

which makes me think something else is the culprit here. The test case
doesn't do much outside of creating two rings, it doesn't actually use
them.

CC'ing likely suspects on the nvme front. This is on 6.13-rc4 fwiw.

--
Jens Axboe