Re: [PATCH] char: misc: make misc_open() and misc_register() killable
From: Tetsuo Handa
Date: Wed Jul 06 2022 - 06:26:38 EST
On 2022/07/06 15:34, Greg KH wrote:
> On Wed, Jul 06, 2022 at 03:21:15PM +0900, Tetsuo Handa wrote:
>> How should we fix this problem?
>
> We can decrease the timeout in usb_stor_msg_common(). I imagine that if
> that timeout is ever hit in this sequence, then all will recover, right?
> Try decreasing it to a sane number and see what happens.
Yes, all recovers with below diff.
------------------------------------------------------------
diff --git a/drivers/usb/storage/transport.c b/drivers/usb/storage/transport.c
index 1928b3918242..d2a192306e0c 100644
--- a/drivers/usb/storage/transport.c
+++ b/drivers/usb/storage/transport.c
@@ -164,7 +164,7 @@ static int usb_stor_msg_common(struct us_data *us, int timeout)
/* wait for the completion of the URB */
timeleft = wait_for_completion_interruptible_timeout(
- &urb_done, timeout ? : MAX_SCHEDULE_TIMEOUT);
+ &urb_done, timeout ? : 5 * HZ);
clear_bit(US_FLIDX_URB_ACTIVE, &us->dflags);
------------------------------------------------------------
But
>> Anyway,
>>
>> /*
>> * Resuming. We may need to wait for the image device to
>> * appear.
>> */
>> wait_for_device_probe();
>>
>> in snapshot_open() will sleep forever if some device became unresponsive.
>>
wait_for_device_probe() in snapshot_open() was added by commit c751085943362143
("PM/Hibernate: Wait for SCSI devices scan to complete during resume"), and
that commit did not take into account possibility of unresponsive hardware.
"In addition, if the resume from hibernation is userland-driven, it's
better to wait for all device probes in the kernel to complete before
attempting to open the resume device."
It is trivial to make e.g. atomic_read(&probe_count) == 10, which means that
acceptable timeout for usb_stor_msg_common() may be no longer acceptable timeout
for wait_for_device_probe(). Unlike flush_workqueue(), wait_for_device_probe()
can wait forever if new probe requests keep coming in while waiting for existing
probe requests to complete. Therefore, I think we should introduce timeout on
wait_for_device_probe() side as well.
I would like to propose below changes in 3 patches as fixes for this problem.
Since there are 13 wait_for_device_probe() callers, maybe we want both killable
and uninterruptible versions and pass timeout as an argument...
------------------------------------------------------------
drivers/base/dd.c | 3 ++-
drivers/char/misc.c | 9 ++++++---
drivers/usb/storage/transport.c | 2 +-
3 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 3fc3b5940bb3..67e08b381ee2 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -723,7 +723,8 @@ void wait_for_device_probe(void)
flush_work(&deferred_probe_work);
/* wait for the known devices to complete their probing */
- wait_event(probe_waitqueue, atomic_read(&probe_count) == 0);
+ wait_event_killable_timeout(probe_waitqueue,
+ atomic_read(&probe_count) == 0, 60 * HZ);
async_synchronize_full();
}
EXPORT_SYMBOL_GPL(wait_for_device_probe);
diff --git a/drivers/char/misc.c b/drivers/char/misc.c
index ca5141ed5ef3..6430c534a1cb 100644
--- a/drivers/char/misc.c
+++ b/drivers/char/misc.c
@@ -104,7 +104,8 @@ static int misc_open(struct inode *inode, struct file *file)
int err = -ENODEV;
const struct file_operations *new_fops = NULL;
- mutex_lock(&misc_mtx);
+ if (mutex_lock_killable(&misc_mtx))
+ return -EINTR;
list_for_each_entry(c, &misc_list, list) {
if (c->minor == minor) {
@@ -116,7 +117,8 @@ static int misc_open(struct inode *inode, struct file *file)
if (!new_fops) {
mutex_unlock(&misc_mtx);
request_module("char-major-%d-%d", MISC_MAJOR, minor);
- mutex_lock(&misc_mtx);
+ if (mutex_lock_killable(&misc_mtx))
+ return -EINTR;
list_for_each_entry(c, &misc_list, list) {
if (c->minor == minor) {
@@ -178,7 +180,8 @@ int misc_register(struct miscdevice *misc)
INIT_LIST_HEAD(&misc->list);
- mutex_lock(&misc_mtx);
+ if (mutex_lock_killable(&misc_mtx))
+ return -EINTR;
if (is_dynamic) {
int i = find_first_zero_bit(misc_minors, DYNAMIC_MINORS);
diff --git a/drivers/usb/storage/transport.c b/drivers/usb/storage/transport.c
index 1928b3918242..d2a192306e0c 100644
--- a/drivers/usb/storage/transport.c
+++ b/drivers/usb/storage/transport.c
@@ -164,7 +164,7 @@ static int usb_stor_msg_common(struct us_data *us, int timeout)
/* wait for the completion of the URB */
timeleft = wait_for_completion_interruptible_timeout(
- &urb_done, timeout ? : MAX_SCHEDULE_TIMEOUT);
+ &urb_done, timeout ? : 60 * HZ);
clear_bit(US_FLIDX_URB_ACTIVE, &us->dflags);
------------------------------------------------------------