[PATCH 2/3] scsi: pm80xx: Do not issue hard reset before NCQ EH

From: TJ Adams
Date: Fri Jun 07 2024 - 13:58:23 EST


From: Igor Pylypiv <ipylypiv@xxxxxxxxxx>

v6.2 commit 811be570a9a8 ("scsi: pm8001: Use sas_ata_device_link_abort()
to handle NCQ errors") removed duplicate NCQ EH from the pm80xx driver
and started relying on libata to handle the NCQ errors. The PM8006
controller has a special EH sequence that was added in v4.15 commit
869ddbdcae3b ("scsi: pm80xx: corrected SATA abort handling sequence.").
The special EH sequence issues a hard reset to a drive before libata EH
has a chance to read the NCQ log page. Libata EH gets confused by empty
NCQ log page which results in HSM violation. The failed command gets
retried a few times and each time fails with the same HSM violation.
Finally, libata decides to disable NCQ due to subsequent HSM vioaltions.

To avoid unwanted hard resets we can initiate abort all from the driver
to prevent libsas EH from calling lldd_abort_task()/pm8001_abort_task().

Signed-off-by: Igor Pylypiv <ipylypiv@xxxxxxxxxx>
Signed-off-by: Terrence Adams <tadamsjr@xxxxxxxxxx>
---
drivers/scsi/pm8001/pm8001_hwi.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/drivers/scsi/pm8001/pm8001_hwi.c b/drivers/scsi/pm8001/pm8001_hwi.c
index dec1e2d380f1..f19f76dc6e1c 100644
--- a/drivers/scsi/pm8001/pm8001_hwi.c
+++ b/drivers/scsi/pm8001/pm8001_hwi.c
@@ -1672,7 +1672,18 @@ void pm8001_work_fn(struct work_struct *work)
break;
case IO_XFER_ERROR_ABORTED_NCQ_MODE:
{
+ struct pm8001_hba_info *pm8001_ha = pw->pm8001_ha;
dev = pm8001_dev->sas_device;
+ /*
+ * pm8001_abort_task() issues a hard reset to a drive
+ * before libata EH has a chance to read the NCQ log page.
+ *
+ * Initiate abort all from the driver to prevent libsas EH
+ * from calling lldd_abort_task() / pm8001_abort_task().
+ */
+ if (pm8001_ha->chip_id == chip_8006)
+ sas_execute_internal_abort_dev(dev, 0, NULL);
+
sas_ata_device_link_abort(dev, false);
}
break;
--
2.45.2.505.gda0bf45e8d-goog