On Wed, Jul 20, 2022 at 6:51 PM Bart Van Assche <bvanassche@xxxxxxx> wrote:
I'm not familiar with the SATA code but from a quick look it seems like
the above code is only triggered from inside the ATA error handler
(ata_do_eh() -> ata_eh_recover() -> ata_eh_revalidate_and_attach() ->
schedule_work(&(ap->scsi_rescan_task) -> ata_scsi_dev_rescan()). It
doesn't seem normal to me that the ATA error handler gets invoked during
a resume. How about testing the following two code changes?
Thanks for your suggestions!
* In sd_start_stop_device(), change "return sd_submit_start(sdkp, cmd,
sizeof(cmd))" into "sd_submit_start(sdkp, cmd, sizeof(cmd))" and below
that call add "flush_work(&sdkp->start_done_work)". This makes
sd_start_stop_device() again synchronous. This will learn us whether the
behavior change is caused by submitting the START command from another
context or by not waiting until the START command has finished.
Unfortunately this doesn't have any impact.
* Back out the above change, change "return sd_submit_start(sdkp, cmd,
sizeof(cmd))" again into "sd_submit_start(sdkp, cmd, sizeof(cmd))" and
below that statement add a call to
scsi_run_queue(sdkp->device->request_queue). If this change helps it
(that's the static scsi_run_queue() in drivers/scsi/scsi_lib.c?)
means that the scsi_run_queue() call is necessary to prevent reordering
of the START command with other SCSI commands.
Unfortunately this doesn't have any impact either.