[PATCH 0/1] Fix not fully initialized SCSI commands
From: Anastasia Kovaleva
Date: Mon Mar 24 2025 - 04:51:59 EST
We have encountered the following type of logs on initiators:
kernel: sd 16:0:1:84: [sdts] tag#405 timing out command, waited 720s
kernel: sd 16:0:1:84: [sdts] tag#405 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=66636s
The initiator uses dm-mpath for multipathing, the SCSI mid layer, and
the QLogic FC HBA driver (qla2xxx). After debugging, the following call
stack was identified:
blk_mq_sched_dispatch_requests()
blk_mq_dispatch_rq_list()
dm_mq_queue_rq()
map_request()
ti->type->clone_and_map_rq() // New cloned request with tag 405
blk_insert_cloned_request()
scsi_queue_rq()
qla2xxx_mqueuecommand()
qla2xxx_dif_start_scsi_mq()
If qla2xxx_dif_start_scsi_mq() returns an error for any reason (e.g.,
due to extremely heavy traffic causing the driver to exhaust its
handles), scsi_done() -> scsi_end_request() is not called within
qla2xxx_mqueuecommand(). As a result, the SCMD_INITIALIZED flag
remains set. Next, map_request() releases the cloned request and
requeues the original request. While the cloned request is released, the
associated SCSI command retains stale data from the previous command.
If all I/O traffic stops for some extended period of time, and later
resumes, the following scenario may occur:
blk_mq_sched_dispatch_requests()
blk_mq_dispatch_rq_list()
dm_mq_queue_rq()
map_request()
ti->type->clone_and_map_rq() // New cloned request uses tag 405 again
blk_insert_cloned_request()
scsi_queue_rq()
Within scsi_queue_rq(), the scsi_init_command() function does not call
scsi_initialize_rq() because the SCMD_INITIALIZED flag is already set.
Because of that, when the command completes in scsi_complete(), the
scsi_cmd_runtime_exceeded() check returns true, causing the command to
fail.
This issue appears after the commit 4abafdc4360d ("block: remove the
initialize_rq_fn blk_mq_ops method"). Before this change, the
initialize_rq_fn method forcibly initialized the SCSI command in
blk_get_request(). There may be other places where a command is queued
in scsi_queue_rq() but scsi_done() is not called.
Anastasia Kovaleva (1):
scsi: uninit not completed scsi cmd
drivers/scsi/scsi_lib.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--
2.40.3