Re: [PATCH v4 06/10] scsi: ufs: Remove host_sem used in suspend/resume

From: Bart Van Assche
Date: Mon Jun 28 2021 - 13:31:53 EST


On 6/28/21 1:17 AM, Can Guo wrote:
> On 2021-06-25 01:11, Bart Van Assche wrote:
>> On 6/23/21 11:31 PM, Can Guo wrote:
>>> Using back host_sem in suspend_prepare()/resume_complete() won't have
>>> this problem of deadlock, right?
>>
>> Although that would solve the deadlock discussed in this email thread, it
>> wouldn't solve the issue of potential adverse interactions of the UFS
>> error handler and the SCSI error handler running concurrently.
>
> I think I've explained it before, paste it here -
>
> ufshcd_eh_host_reset_handler() invokes ufshcd_err_handler() and flushes it,
> so SCSI error handler and UFS error handler can safely run together.

That code path is the exception. Do you agree that the following three
functions all invoke the ufshcd_err_handler() function asynchronously?
* ufshcd_uic_pwr_ctrl()
* ufshcd_check_errors()
* ufshcd_abort()

>> How about using the
>> standard approach for invoking the UFS error handler instead of using
>> a custom
>> mechanism, e.g. by using something like the (untested) patch below? This
>> approach guarantees that the UFS error handler is only activated after
>> all
>> pending SCSI commands have failed or timed out and also guarantees
>> that no new
>> SCSI commands will be queued while the UFS error handler is in
>> progress (see
>> also scsi_host_queue_ready()).
>
> Per my understanding, SCSI error handling is scsi cmd based, meaning it
> only works when certain SCSI cmds failed [ ... ]
That is not completely correct. The SCSI error handler is activated if
either all pending commands have failed or if it is scheduled
explicitly. Please take a look at the host_eh_scheduled member variable,
how it is used and also at scsi_schedule_eh(). The scsi_schedule_eh()
function was introduced in 2006 and that the ATA code uses it since then
to activate the SCSI error handler even if no commands are pending. See
also the patch "SCSI: make scsi_implement_eh() generic API for SCSI
transports".

> However, most UFS (UIC) errors happens during gear scaling, clk gating
> and suspend/resume (due to power mode changes and/or hibern8
> enter/exit), during which there is NO scsi cmds in UFS driver at all
> (because these contexts start only when there is no ongoing data
> transactions).

Activating the SCSI error handler if no SCSI commands are in progress is
supported by scsi_schedule_eh().

> Thus, scsi_unjam_host() won't even call scsi_eh_ready_devs() because
> scsi_eh_get_sense() always returns TRUE in these cases (eh_work_q is
> empty).

Please take another look at the patch in my previous message. There is a
scsi_transport_template instance in that patch. The eh_strategy_handler
defined in a SCSI transport template is called *instead* of
scsi_unjam_host(). In other words, scsi_unjam_host() won't be called if
my patch would be applied to the UFS driver.

Please let me know if you need more information.

Bart.