Re: [PATCH v3 8/9] scsi: ufs: Update the fast abort path in ufshcd_abort() for PM requests
From: Can Guo
Date: Sun Jun 13 2021 - 10:44:00 EST
On 2021-06-13 00:50, Bart Van Assche wrote:
On 6/12/21 12:07 AM, Can Guo wrote:
Sigh... I also want my life and work to be easier...
How about reducing the number of states and state transitions in the
driver? One source of complexity is that ufshcd_err_handler() is
independently of the SCSI error handler and hence may run concurrently
with the SCSI error handler. Has the following already been considered?
- Call ufshcd_err_handler() synchronously from ufshcd_abort() and
ufshcd_eh_host_reset_handler() instead of asynchronously.
1. ufshcd_eh_host_reset_handler() invokes ufshcd_err_handler() and
it, so it is synchronous. ufshcd_eh_host_reset_handler() used to call
reset_and_restore() directly, which can run concurrently with UFS error
so I fixed it last year .
2. ufshcd_abort() invokes ufshcd_err_handler() synchronously can have a
live lock issue, which is why I chose the asynchronous way (from the
day I started to fix error handling). The live lock happens when abort
to a PM request, e.g., a SSU cmd sent from suspend/resume. Because UFS
handler is synchronized with suspend/resume (by calling
and lock_system_sleep()), the sequence is like:
 ufshcd_wl_resume() sends SSU cmd
 ufshcd_abort() calls UFS error handler
 UFS error handler calls lock_system_sleep() and
In above sequence, either lock_system_sleep() or pm_runtime_get_sync()
be blocked -  is blocked by ,  is blocked by , while  is
blocked by .
For PM requests, I chose to abort them fast to unblock suspend/resume,
suspend/resume shall fail of course, but UFS error handler recovers
PM errors anyways.
- Call scsi_schedule_eh() from ufshcd_uic_pwr_ctrl() and
ufshcd_check_errors() instead of ufshcd_schedule_eh_work().
When ufshcd_uic_pwr_ctrl() and/or ufshcd_check_errors() report errors,
usually they are fatal errors, according to UFSHCI spec, SW should
UFS to recover.
However scsi_schedule_eh() does more than that - scsi_unjam_host() sends
request sense cmd and calls scsi_eh_ready_devs(), while
sends test unit ready cmd and calls all the way down to
bus/host_reset(). But we only need scsi_eh_host_reset() in this case. I
you have concerns that scsi_schedule_eh() may run concurrently with UFS
handler, but as I mentioned above in  - I've made
synchronized with UFS error handler, hope that can ease your concern.
I am not saying your idea won't work, it is a good suggestion. I will
it after these changes go in, because it would require extra effort and
effort won't be minor - I need to consider how to remove/reduce the
states along with the change and the error injection and stability test
over again, which is a long way to go. As for now, at least current
works well as per my test and we really need these changes for
These changes will guarantee that all commands have completed or timed
out before ufshcd_err_handler() is called. I think that would allow to
remove e.g. the following code from the error handler:
/* Drain ufshcd_queuecommand() */