Re: [PATCH] ata: libata-eh: Clear scsicmd->result when setting SAM_STAT_CHECK_CONDITION
From: Niklas Cassel
Date: Thu Sep 05 2024 - 04:34:16 EST
Hello Igor,
On Wed, Sep 04, 2024 at 10:37:27PM +0000, Igor Pylypiv wrote:
> commit 24aeebbf8ea9 ("scsi: ata: libata: Change ata_eh_request_sense() to
> not set CHECK_CONDITION") changed the way how SAM_STAT_CHECK_CONDITION is
> set. Assignment "scmd->result = SAM_STAT_CHECK_CONDITION;" was replaced by
> set_status_byte() which does not clear the scsicmd->result.
"which does not clear the scsicmd->result"
scsicmd->result is a combination of:
-SCSI status byte
-SCSI ML byte
-host byte
Please be more specific of which byte(s) that you want to clear,
both here and in other places in the commit message.
>
> By not clearing the scsicmd->result we end up in a state where both
> the DID_TIME_OUT host byte and the SAM_STAT_CHECK_CONDITION status
> bytes are set.
>
> The DID_TIME_OUT host byte is getting set a part of error handling:
>
> ata_qc_complete()
> ata_qc_schedule_eh()
> blk_abort_request()
> WRITE_ONCE(req->deadline, jiffies);
>
> blk_mq_timeout_work()
> blk_mq_check_expired()
> blk_mq_rq_timed_out()
> req->q->mq_ops->timeout() / scsi_timeout()
> set_host_byte(scmd, DID_TIME_OUT);
I would have reorder your commit log and have this first in the commit log.
>
> Having the host byte set to DID_TIME_OUT for a command that didn't timeout
> is confusing. Let's bring the old behavior back by setting scmd->result to
> SAM_STAT_CHECK_CONDITION.
I think you are missing something very important in the commit log here:
What is the user visible change before and after your change?
Is there a difference is the error message in dmesg? If not, that should
be mentioned as well.
>
> Fixes: 24aeebbf8ea9 ("scsi: ata: libata: Change ata_eh_request_sense() to not set CHECK_CONDITION")
> Signed-off-by: Igor Pylypiv <ipylypiv@xxxxxxxxxx>
> ---
> drivers/ata/libata-eh.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index 214b935c2ced..4927b40e782f 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -1605,7 +1605,7 @@ static unsigned int ata_eh_analyze_tf(struct ata_queued_cmd *qc)
> */
> if (!(qc->flags & ATA_QCFLAG_SENSE_VALID) &&
> (stat & ATA_SENSE) && ata_eh_request_sense(qc))
> - set_status_byte(qc->scsicmd, SAM_STAT_CHECK_CONDITION);
> + qc->scsicmd->result = SAM_STAT_CHECK_CONDITION;
ata_eh_analyze_tf() will only be called on commands that are owned by EH
(has ATA_QCFLAG_EH set).
Thus this command will end up in:
ata_eh_finish() -> ata_eh_qc_complete() -> __ata_eh_qc_complete() ->
-> __ata_qc_complete() -> qc->complete_fn().
complete_fn will be (except in special cases): ata_scsi_qc_complete()
If you look at ata_scsi_qc_complete(), it already clears the host byte:
https://github.com/torvalds/linux/blob/v6.11-rc6/drivers/ata/libata-scsi.c#L1695-L1696
So could you please be more specific of what problem this change is fixing?
Is it just that you think that it makes sense to clear the host byte earlier
in the call chain?
There are many different paths a QC can take via EH, e.g. policy 0xD NCQ
commands will not fetch sense data via ata_eh_request_sense(), so clearing
the host byte in ata_scsi_qc_complete() should be fine, otherwise we need
to do a similar change to yours in all the different code paths that sets
sense data ...which might actually be something that makes sense, but then
I would expect a patch series that changes all the locations where we set
sense data, not just in ata_eh_analyze_tf(), and then drops the clearing in
ata_scsi_qc_complete() (which was introduced in commit 7574a8377c7a ("ata:
libata-scsi: do not overwrite SCSI ML and status bytes")).
See this patch to see all the places where we set sense data:
https://git.kernel.org/pub/scm/linux/kernel/git/libata/linux.git/commit/?h=for-6.12&id=9526dec226f0779d72f798e7a18375bf8d414775
Side note:
You might also be interested to know that a command that was sent via EH will
be finished using scsi_eh_finish_cmd() (called by __ata_eh_qc_complete()),
and will thus end up in scsi_eh_flush_done_q(). A command that has sense data
will have scmd->retries == scmd->allowed (set by ata_eh_qc_complete()), so you
will end up in this code path:
https://github.com/torvalds/linux/blob/v6.11-rc6/drivers/scsi/scsi_error.c#L2213-L2227
Which means that SCSI EH will set DID_TIME_OUT for any command that does
not have (SCSI ML byte || SCSI status byte || host byte) set.
A command with sense data will have most often have CHECK_CONDITION set, but
there is also CDL policy 0xD commands (which will have the SCSI ML byte set).
So this special flag SCMD_FORCE_EH_SUCCESS is for commands that were completed
successfully without any SK/ASC/ASCQ in the same interrupt as other policy 0xD
commands which did have SK/ASC/ASCQ set.
For details, see 3d848ca1ebc8 ("scsi: core: Allow libata to complete successful
commands via EH").
Kind regards,
Niklas