Re: [syzbot] [net?] INFO: task hung in new_device_store (5)

From: Hillf Danton
Date: Fri Sep 27 2024 - 07:07:15 EST


On Thu, 26 Sep 2024 22:14:14 +0200 Eric Dumazet <edumazet@xxxxxxxxxx>
> On Thu, Sep 26, 2024 at 7:58 PM syzbot wrote:
> >
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: 97d8894b6f4c Merge tag 'riscv-for-linus-6.12-mw1' of git:/..
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=12416a27980000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=bc30a30374b0753
> > dashboard link: https://syzkaller.appspot.com/bug?extid=05f9cecd28e356241aba
> > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/bd119f4fdc08/disk-97d8894b.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/4d0bfed66f93/vmlinux-97d8894b.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/0f9223ac9bfb/bzImage-97d8894b.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+05f9cecd28e356241aba@xxxxxxxxxxxxxxxxxxxxxxxxx
> >
> > INFO: task syz-executor:9916 blocked for more than 143 seconds.
> > Not tainted 6.11.0-syzkaller-10045-g97d8894b6f4c #0
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > task:syz-executor state:D stack:21104 pid:9916 tgid:9916 ppid:1 flags:0x00000004
> > Call Trace:
> > <TASK>
> > context_switch kernel/sched/core.c:5315 [inline]
> > __schedule+0x1895/0x4b30 kernel/sched/core.c:6674
> > __schedule_loop kernel/sched/core.c:6751 [inline]
> > schedule+0x14b/0x320 kernel/sched/core.c:6766
> > schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6823
> > __mutex_lock_common kernel/locking/mutex.c:684 [inline]
> > __mutex_lock+0x6a7/0xd70 kernel/locking/mutex.c:752
> > new_device_store+0x1b4/0x890 :166
> > kernfs_fop_write_iter+0x3a2/0x500 fs/kernfs/file.c:334
> > new_sync_write fs/read_write.c:590 [inline]
> > vfs_write+0xa6f/0xc90 fs/read_write.c:683
> > ksys_write+0x183/0x2b0 fs/read_write.c:736
> > do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> > do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
> > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > RIP: 0033:0x7f8310d7c9df
> > RSP: 002b:00007ffe830a52e0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
> > RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f8310d7c9df
> > RDX: 0000000000000003 RSI: 00007ffe830a5330 RDI: 0000000000000005
> > RBP: 00007f8310df1c39 R08: 0000000000000000 R09: 00007ffe830a5137
> > R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000003
> > R13: 00007ffe830a5330 R14: 00007f8311a64620 R15: 0000000000000003
> > </TASK>
>
> typical sysfs deadlock ?
>
> diff --git a/drivers/net/netdevsim/bus.c b/drivers/net/netdevsim/bus.c
> index 64c0cdd31bf85468ce4fa2b2af5c8aff4cfba897..3bf0ce52d71653fd9b8c752d52d0b5b7e19042d8
> 100644
> --- a/drivers/net/netdevsim/bus.c
> +++ b/drivers/net/netdevsim/bus.c
> @@ -163,7 +163,9 @@ new_device_store(const struct bus_type *bus, const
> char *buf, size_t count)
> return -EINVAL;
> }
>
> - mutex_lock(&nsim_bus_dev_list_lock);
> + if (!mutex_trylock(&nsim_bus_dev_list_lock))
> + return restart_syscall();
> +
> /* Prevent to use resource before initialization. */
> if (!smp_load_acquire(&nsim_bus_enable)) {
> err = -EBUSY;
>
>
> >
> > Showing all locks held in the system:
...
> > 4 locks held by syz-executor/9916:
> > #0: ffff88807ca86420 (sb_writers#8){.+.+}-{0:0}, at: file_start_write include/linux/fs.h:2930 [inline]
> > #0: ffff88807ca86420 (sb_writers#8){.+.+}-{0:0}, at: vfs_write+0x224/0xc90 fs/read_write.c:679
> > #1: ffff88802e71e488 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x1ea/0x500 fs/kernfs/file.c:325
> > #2: ffff888144ff5968 (kn->active#50){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x20e/0x500 fs/kernfs/file.c:326
> > #3: ffffffff8f56d3e8 (nsim_bus_dev_list_lock){+.+.}-{3:3}, at: new_device_store+0x1b4/0x890 drivers/net/netdevsim/bus.c:166

syz-executor/9916 is lock waiter, and

> > 7 locks held by syz-executor/9976:
> > #0: ffff88807ca86420 (sb_writers#8){.+.+}-{0:0}, at: file_start_write include/linux/fs.h:2930 [inline]
> > #0: ffff88807ca86420 (sb_writers#8){.+.+}-{0:0}, at: vfs_write+0x224/0xc90 fs/read_write.c:679
> > #1: ffff88807abc2888 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x1ea/0x500 fs/kernfs/file.c:325
> > #2: ffff888144ff5a58 (kn->active#49){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x20e/0x500 fs/kernfs/file.c:326
> > #3: ffffffff8f56d3e8 (nsim_bus_dev_list_lock){+.+.}-{3:3}, at: del_device_store+0xfc/0x480 drivers/net/netdevsim/bus.c:216
> > #4: ffff888060f5a0e8 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:1014 [inline]
> > #4: ffff888060f5a0e8 (&dev->mutex){....}-{3:3}, at: __device_driver_lock drivers/base/dd.c:1095 [inline]
> > #4: ffff888060f5a0e8 (&dev->mutex){....}-{3:3}, at: device_release_driver_internal+0xce/0x7c0 drivers/base/dd.c:1293
> > #5: ffff888060f5b250 (&devlink->lock_key#40){+.+.}-{3:3}, at: nsim_drv_remove+0x50/0x160 drivers/net/netdevsim/dev.c:1672
> > #6: ffffffff8fccdc48 (rtnl_mutex){+.+.}-{3:3}, at: nsim_destroy+0x71/0x5c0 drivers/net/netdevsim/netdev.c:773

syz-executor/9976 is lock owner. Given both waiter and owner printed,
the proposed trylock looks like the typical paperover at least from a
hoofed skull because of no real deadlock detected.