Re: [PATCH] char: misc: make misc_open() and misc_register() killable
From: Tetsuo Handa
Date: Tue Jul 05 2022 - 01:21:27 EST
On 2022/07/04 23:31, Greg KH wrote:
> I don't understand what you are trying to "fix" here. What is userspace
> doing (as a normal user) that is causing a problem, and what problem is
> it causing and for what device/hardware/driver is this a problem?
Currently the root cause is unknown.
This might be another example of deadlock hidden by device_initialize().
We can see from https://syzkaller.appspot.com/text?tag=CrashReport&x=11feb7e0080000 that
when khungtaskd reports that a process is blocked waiting for misc_mtx at misc_open(),
there is a process which is holding system_transition_mutex from snapshot_open().
----------------------------------------
INFO: task syz-executor.4:21922 blocked for more than 143 seconds.
Not tainted 5.19.0-rc4-syzkaller-00187-g089866061428 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.4 state:D stack:28408 pid:21922 ppid: 3666 flags:0x00000004
2 locks held by syz-executor.5/21906:
#0: ffffffff8c82f708 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x5f/0x4a0 drivers/char/misc.c:107
#1: ffffffff8bc536e8 (system_transition_mutex){+.+.}-{3:3}, at: snapshot_open+0x3b/0x2a0 kernel/power/user.c:54
1 lock held by syz-executor.4/21922:
#0: ffffffff8c82f708 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x5f/0x4a0 drivers/char/misc.c:107
----------------------------------------
Possible locations where snapshot_open() might sleep with system_transition_mutex held are
pm_notifier_call_chain_robust()/wait_for_device_probe()/create_basic_memory_bitmaps().
But I think we can exclude pm_notifier_call_chain_robust() because lockdep does not report
that that process is holding "struct blocking_notifier_head"->rwsem. I suspect that
that process is sleeping at wait_for_device_probe(), for it waits for probe operations.
----------------------------------------
void wait_for_device_probe(void)
{
/* wait for the deferred probe workqueue to finish */
flush_work(&deferred_probe_work);
/* wait for the known devices to complete their probing */
wait_event(probe_waitqueue, atomic_read(&probe_count) == 0);
async_synchronize_full();
}
----------------------------------------
>
> Yes, you can sleep in open(), but you shouldn't sleep long, if at all
> possible as it can be annoying. So why not fix up the offending driver
> not to sleep to long?
We can't predict how long snapshot_open() sleeps inside wait_for_device_probe().
Looking at abovementioned report again, it seems to be common that one process is
inside input_register_handle() and another process is inside input_close_device(),
and these two processes are holding the same &dev->mutex#2 object. Guessing from
the code that input_register_handle() will not sleep with dev->mutex held,
input_close_device() is holding dev->mutex and input_register_handle() is
waiting for input_close_device() to release dev->mutex.
Therefore, there might be a race or deadlock between these two processes.
If &dev->mutex#2 were subjected to device_initialize() magic, lockdep won't be
able to catch the deadlock. But I'm not familiar with device management code...
Maybe input_close_device() is failing to release dev->mutex for some reason?
Maybe nothing but too slow to wait?
----------------------------------------
7 locks held by kworker/1:0/22:
#0: ffff888011a65d38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff888011a65d38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline]
#0: ffff888011a65d38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: atomic_long_set include/linux/atomic/atomic-instrumented.h:1280 [inline]
#0: ffff888011a65d38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:636 [inline]
#0: ffff888011a65d38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:663 [inline]
#0: ffff888011a65d38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: process_one_work+0x87a/0x1610 kernel/workqueue.c:2260
#1: ffffc900001c7da8 ((work_completion)(&hub->events)){+.+.}-{0:0}, at: process_one_work+0x8ae/0x1610 kernel/workqueue.c:2264
#2: ffff8881479d4190 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:835 [inline]
#2: ffff8881479d4190 (&dev->mutex){....}-{3:3}, at: hub_event+0x1c1/0x4690 drivers/usb/core/hub.c:5691
#3: ffff888044782190 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:835 [inline]
#3: ffff888044782190 (&dev->mutex){....}-{3:3}, at: __device_attach+0x76/0x530 drivers/base/dd.c:964
#4: ffff8880447d2118 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:835 [inline]
#4: ffff8880447d2118 (&dev->mutex){....}-{3:3}, at: __device_attach+0x76/0x530 drivers/base/dd.c:964
#5: ffffffff8ceafca8 (input_mutex){+.+.}-{3:3}, at: input_register_device.cold+0x34/0x304 drivers/input/input.c:2378
#6: ffff8880447d52c0 (&dev->mutex#2){+.+.}-{3:3}, at: input_register_handle+0x6d/0x510 drivers/input/input.c:2544
2 locks held by acpid/2962:
#0: ffff888042a28158 (&joydev->mutex){+.+.}-{3:3}, at: joydev_close_device drivers/input/joydev.c:220 [inline]
#0: ffff888042a28158 (&joydev->mutex){+.+.}-{3:3}, at: joydev_release+0x187/0x290 drivers/input/joydev.c:252
#1: ffff8880447d52c0 (&dev->mutex#2){+.+.}-{3:3}, at: input_close_device+0x42/0x1f0 drivers/input/input.c:726
7 locks held by kworker/1:11/5743:
#0: ffff888011a65d38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff888011a65d38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline]
#0: ffff888011a65d38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: atomic_long_set include/linux/atomic/atomic-instrumented.h:1280 [inline]
#0: ffff888011a65d38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:636 [inline]
#0: ffff888011a65d38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:663 [inline]
#0: ffff888011a65d38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: process_one_work+0x87a/0x1610 kernel/workqueue.c:2260
#1: ffffc900153c7da8 ((work_completion)(&hub->events)){+.+.}-{0:0}, at: process_one_work+0x8ae/0x1610 kernel/workqueue.c:2264
#2: ffff888021384190 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:835 [inline]
#2: ffff888021384190 (&dev->mutex){....}-{3:3}, at: hub_event+0x1c1/0x4690 drivers/usb/core/hub.c:5691
#3: ffff8880468a4190 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:835 [inline]
#3: ffff8880468a4190 (&dev->mutex){....}-{3:3}, at: __device_attach+0x76/0x530 drivers/base/dd.c:964
#4: ffff8880468a6118 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:835 [inline]
#4: ffff8880468a6118 (&dev->mutex){....}-{3:3}, at: __device_attach+0x76/0x530 drivers/base/dd.c:964
#5: ffff8880255f1a20 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:835 [inline]
#5: ffff8880255f1a20 (&dev->mutex){....}-{3:3}, at: __device_attach+0x76/0x530 drivers/base/dd.c:964
#6: ffffffff8ceafca8 (input_mutex){+.+.}-{3:3}, at: input_register_device.cold+0x34/0x304 drivers/input/input.c:2378
----------------------------------------