Re: [PATCH v1] scsi: ufs: clear doorbell for hibern8 errors when using ah8

From: Bart Van Assche
Date: Thu Oct 07 2021 - 17:47:16 EST


On 10/7/21 12:40 AM, Kiwoong Kim wrote:
When an scsi command is dispatched right after host complete
all the pending requests goes to idle and ufs driver tries
to ring a doorbell, host might be still during entering into
hibern8. If an error occurrs during that period, the doorbell
might not be zero. In this case, clearing it should be needed
to requeue its command.
Currently, ufshcd_err_handler goes directly to reset when the
driver's link state is broken. This patch is to make it clear
doorbells in the case. In the situation, communication would
be disabled, so TM isn't necessary or we can say it's not
available.

The above text is too hard to comprehend. Please make it more clear. As an example, in the first sentence, what does "goes to idle" apply to? Does it apply to "SCSI command", "pending requests" or something else?

What mechanism does "If an error occurs" refer to? A SCSI command processing error, a link error or another type of error?

Here's an actual symptom that I've faced. At the time, tag #17
is still pended even after host reset. And then the block timer
is expired.

pended -> pending.

exynos-ufs 11100000.ufs: ufshcd_check_errors: Auto Hibern8
Enter failed - status: 0x00000040, upmcrs: 0x00000001
..
host_regs: 00000050: b8671000 00000008 00020000 00000000
..
exynos-ufs 11100000.ufs: ufshcd_abort: Device abort task at tag 17

The relationship between the example and the description above the example is not clear. If a command is pending before the error handler starts, aborting that command fails and the host is not reset then the command will still be pending after the error handler has finished.

@@ -6138,7 +6139,12 @@ static void ufshcd_err_handler(struct Scsi_Host *host)
(hba->saved_uic_err & (UFSHCD_UIC_DL_NAC_RECEIVED_ERROR |
UFSHCD_UIC_DL_TCx_REPLAY_ERROR)))) {
needs_reset = true;
- goto do_reset;
+ spin_lock_irqsave(hba->host->host_lock, flags);
+ if (!hba->ahit && ufshcd_is_link_broken(hba))
+ link_broken_in_ah8 = true;
+ spin_unlock_irqrestore(hba->host->host_lock, flags);
+ if (!link_broken_in_ah8)
+ goto do_reset;
}

My understanding is that hba->ahit == 0 means that auto-hibernation is disabled. Hence, the above code sets 'link_broken_in_ah8' only if auto-hibernation is disabled. So what does the name of the variable 'link_broken_in_ah8' mean?

@@ -6168,6 +6174,9 @@ static void ufshcd_err_handler(struct Scsi_Host *host)
}
}
+ if (link_broken_in_ah8)
+ goto lock_skip_pending_xfer_clear;
+
/* Clear pending task management requests */
for_each_set_bit(tag, &hba->outstanding_tasks, hba->nutmrs) {
if (ufshcd_clear_tm_cmd(hba, tag)) {

Why is skipping the ufshcd_clear_tm_cmd() calls useful in this case?

Thanks,

Bart.