Re: [PATCH v3 4/4] scsi: scsi_core: Fix IO hang when device removing

From: Wenchao Hao
Date: Thu Mar 07 2024 - 09:36:24 EST

Next message: Sean Christopherson: "Re: [PATCH 05/16] KVM: x86/mmu: Use synthetic page fault error code to indicate private faults"
Previous message: Liang, Kan: "Re: [PATCH v1 4/6] perf list: Give more details about raw event encodings"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2023/10/16 10:03, Wenchao Hao wrote:

shost_for_each_device() would skip devices which is in progress of
removing, so scsi_run_queue() for these devices would be skipped in
scsi_run_host_queues() after blocking hosts' IO.

IO hang would be caused if return true when state is SDEV_CANCEL with
following order:

T1: T2:scsi_error_handler
__scsi_remove_device()
scsi_device_set_state(sdev, SDEV_CANCEL)
...
sd_remove()
del_gendisk()
blk_mq_freeze_queue_wait()
scsi_eh_flush_done_q()
scsi_queue_insert(scmd,...)

scsi_queue_insert() would not kick device's queue since commit
8b566edbdbfb ("scsi: core: Only kick the requeue list if necessary")

After scsi_unjam_host(), the scsi error handler would call
scsi_run_host_queues() to trigger run queue for devices, while it
would not run queue for devices which is in progress of removing
because shost_for_each_device() would skip them.

So the requests added to these queues would not be handled any more,
and the removing device process would hang too.

Fix this issue by using shost_for_each_device_include_deleted() in
scsi_run_host_queues() to trigger a run queue for devices in removing.

This issue is fixed by commit '6df0e077d76bd (scsi: core: Kick the requeue
list after inserting when flushing)', so do not need any more.

Signed-off-by: Wenchao Hao <haowenchao2@xxxxxxxxxx>
---
drivers/scsi/scsi_lib.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 195ca80667d0..40f407ffd26f 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -466,7 +466,7 @@ void scsi_run_host_queues(struct Scsi_Host *shost)
{
struct scsi_device *sdev;
- shost_for_each_device(sdev, shost)
+ shost_for_each_device_include_deleted(sdev, shost)
scsi_run_queue(sdev->request_queue);
}

Next message: Sean Christopherson: "Re: [PATCH 05/16] KVM: x86/mmu: Use synthetic page fault error code to indicate private faults"
Previous message: Liang, Kan: "Re: [PATCH v1 4/6] perf list: Give more details about raw event encodings"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]