Re: [PATCH v3 5/9] scsi: ufs: Simplify error handling preparation

From: Can Guo
Date: Sat Jun 12 2021 - 05:49:40 EST


Hi Bart,

On 2021-06-12 14:46, Can Guo wrote:
On 2021-06-12 04:58, Bart Van Assche wrote:
On 6/10/21 8:01 PM, Can Guo wrote:
Previously, without commit cb7e6f05fce67c965194ac04467e1ba7bc70b069,
ufshcd_resume() may turn off pwr and clk due to UFS error, e.g., link
transition failure and SSU error/abort (and these UFS error would
invoke error handling). When error handling kicks start, it should
re-enable the pwr and clk before proceeding. Now, commit
cb7e6f05fce67c965194ac04467e1ba7bc70b069 makes ufshcd_resume()
purely control pwr and clk, meaning if ufshcd_resume() fails, there
is nothing we can do about it - pwr or clk enabling must have failed,
and it is not because of UFS error. This is why I am removing the
re-enabling pwr/clk in error handling prepare.

Why are link transition failures handled in the error handler instead of
in the context where these errors are detected (ufshcd_resume())? Is it
even possible to recover from a link transition failure or does this
perhaps indicate a broken UFS controller?

Basically, almost all UFS failures are caused by errors in underlaying layers,
i.e., UIC errors, including link transition failures. And according to UFSHCI
spec, SW should do a full reset to recover it, just like handle any other
fatal UIC errors. All UIC errors are detected by HW and reported by IRQ handler.

UFSHCI Spec Ver. 31
8.2.7 Hibernate Enter/Exit Error Handling
Hibernate Enter/Exit Error occurs when the UniPro link is broken. When
this condition occurs,
host software should reset the host controller by setting register HCE
to ‘0’, re-initialize the host
controller by setting register HCE to ‘1', and then start link startup
sequence as shown in Figure 16.


but what I really wonder is why we don't just do recovery directly
in __ufshcd_wl_suspend() and __ufshcd_wl_resume() and strip all
the PM complexity out of ufshcd_err_handling()?

+1

I've explained why I chose not to do this in my last reply to Adrian.
Please kindly check it.


For system suspend/resume, since error handling has the same nature
like user access, so we are using host_sem to avoid concurrency of
error handling and system suspend/resume.

Why is host_sem used for that purpose instead of lock_system_sleep() and
unlock_system_sleep()?


I was aware of it, but the situation is that host_sem is also used to
avoid concurrency among user access, error handling and shutdown, so
I think just use host_sem anyways to simply the lockings, otherwise
user access and error handling would have to take both system_transition_mutex
and host_sem

On second thought, I will take your suggestion to use lock_system_sleep()
and unlock_system_sleep() in error handler and remove the host_sem used
in suspend/resume, which can make the code more readable by keeping the
changes within error handler itself. However, please note that host_sem
will still be used to avoid concurrency of user access, error handler and
shutdown.

Thanks,
Can Guo.


Thanks,

Can Guo.

Thanks,

Bart.