Re: [PATCH v4 06/10] scsi: ufs: Remove host_sem used in suspend/resume

From: Can Guo
Date: Tue Jun 29 2021 - 02:23:40 EST


On 2021-06-29 01:31, Bart Van Assche wrote:
On 6/28/21 1:17 AM, Can Guo wrote:
On 2021-06-25 01:11, Bart Van Assche wrote:
On 6/23/21 11:31 PM, Can Guo wrote:
Using back host_sem in suspend_prepare()/resume_complete() won't have
this problem of deadlock, right?

Although that would solve the deadlock discussed in this email thread, it
wouldn't solve the issue of potential adverse interactions of the UFS
error handler and the SCSI error handler running concurrently.

I think I've explained it before, paste it here -

ufshcd_eh_host_reset_handler() invokes ufshcd_err_handler() and flushes it,
so SCSI error handler and UFS error handler can safely run together.

That code path is the exception. Do you agree that the following three
functions all invoke the ufshcd_err_handler() function asynchronously?
* ufshcd_uic_pwr_ctrl()
* ufshcd_check_errors()
* ufshcd_abort()


I agree, but I don't see what's wrong with that. Any context can invoke
ufs error handler asynchronously and ufs error handler prepare makes
sure error handler can work safely, i.e., stopping PM ops/gating/scaling
in error handler prepare makes sure no one shall call ufshcd_uic_pwr_ctrl()
ever again. And ufshcd_check_errors() and ufshcd_abort() are OK to run
concurrently with UFS error handler.

How about using the
standard approach for invoking the UFS error handler instead of using
a custom
mechanism, e.g. by using something like the (untested) patch below? This
approach guarantees that the UFS error handler is only activated after
all
pending SCSI commands have failed or timed out and also guarantees
that no new
SCSI commands will be queued while the UFS error handler is in
progress (see
also scsi_host_queue_ready()).

Per my understanding, SCSI error handling is scsi cmd based, meaning it
only works when certain SCSI cmds failed [ ... ]
That is not completely correct. The SCSI error handler is activated if
either all pending commands have failed or if it is scheduled
explicitly. Please take a look at the host_eh_scheduled member variable,
how it is used and also at scsi_schedule_eh(). The scsi_schedule_eh()
function was introduced in 2006 and that the ATA code uses it since then
to activate the SCSI error handler even if no commands are pending. See
also the patch "SCSI: make scsi_implement_eh() generic API for SCSI
transports".

However, most UFS (UIC) errors happens during gear scaling, clk gating
and suspend/resume (due to power mode changes and/or hibern8
enter/exit), during which there is NO scsi cmds in UFS driver at all
(because these contexts start only when there is no ongoing data
transactions).

Activating the SCSI error handler if no SCSI commands are in progress is
supported by scsi_schedule_eh().

Thus, scsi_unjam_host() won't even call scsi_eh_ready_devs() because
scsi_eh_get_sense() always returns TRUE in these cases (eh_work_q is
empty).

Please take another look at the patch in my previous message. There is a
scsi_transport_template instance in that patch. The eh_strategy_handler
defined in a SCSI transport template is called *instead* of
scsi_unjam_host(). In other words, scsi_unjam_host() won't be called if
my patch would be applied to the UFS driver.

Please let me know if you need more information.

Sorry that I missed the change of scsi_transport_template() in your previous
message. I can understand that you want to invoke UFS error hander by invoking
SCSI error handler, but I didn't go that far because I saw you changed
pm_runtime_get_sync() to pm_runtime_get_noresume() in ufs error handler prepare.
How can that change make sure that the device is not suspending or resuming
while error handler is running?

Thanks,

Can Guo.


Bart.