Re: [PATCH v2 2/2] scsi: sd: Rework asynchronous resume support

From: Bart Van Assche
Date: Thu Jul 21 2022 - 14:15:04 EST


On 7/21/22 01:07, Geert Uytterhoeven wrote:
On Wed, Jul 20, 2022 at 8:04 PM Bart Van Assche <bvanassche@xxxxxxx> wrote:
That's surprising. Is there anything unusual about the test setup that I
should know, e.g. very small number of CPU cores or a very small queue
depth of the SATA device? How about adding pr_info() statements at the
start and end of the following functions and also before the return
statements in these functions to determine where execution of the START
command hangs?
* sd_start_done().
* sd_start_done_work().

None of these functions seem to be called at all?
That's weird. This means that either sd_submit_start() hangs or that the execution of the START command never finishes. The latter is unlikely since the SCSI error handler is assumed to abort commands that hang. It would also be weird if sd_submit_start() would hang before the START command is submitted since the code flow for submitting the START command is very similar to the code flow for submitting the START command without patch "scsi: sd: Rework asynchronous resume support" (calling scsi_execute()).

What is also weird is that there are at least two SATA setups on which this code works fine, including my Qemu setup.

Although it is possible to enable tracing at boot time, adding the following parameters to the kernel command line would generate too much logging data:

tp_printk trace_event=block_rq_complete,block_rq_error,block_rq_insert,block_rq_issue,block_rq_merge,block_rq_remap,block_rq_requeue,scsi_dispatch_cmd_done,scsi_dispatch_cmd_start,scsi_eh_wakeup,scsi_dispatch_cmd_error,scsi_dispatch_cmd_timeout scsi_mod.scsi_logging_level=32256

I'm not sure what the best way is to proceed since I cannot reproduce this issue.

Bart.