I think that it gets frozen when the internal command for read log ext timesata_read_log_page() will first try to read using READ LOG DMA EXT.
out. More below about that timeout.
If that fails it will retry with READ LOG EXT.
Your log has this:
[ 350.257870] ata1.00: qc timeout (cmd 0x47)
So it is definitely ATA_CMD_READ_LOG_DMA_EXT that times out.
On timeout, ata_exec_internal_sg() will freeze the port:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/ata/libata-core.c?h=v6.0#n1577
When ata_read_log_page() retries with the port frozen,
READ LOG EXT will obviously fail (since the port is frozen).
Not sure why READ LOG DMA EXT would timeout for you...
Perhaps your drive does not implement this command,
and incorrectly reports supporting this command via
ata_id_has_read_log_dma_ext().
Perhaps you could try boot your kernel with libata.force=nodmalog
on the kernel command line, so that ata_read_log_page() will use
READ LOG EXT on the first try.
Damien, it seems that there is no use in retrying if the port
is frozen/we got a timeout, so perhaps:
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index e74ab6c0f1a0..1aa628332c8e 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -2035,7 +2035,8 @@ unsigned int ata_read_log_page(struct ata_device *dev, u8 log,
if (err_mask) {
if (dma) {
dev->horkage |= ATA_HORKAGE_NO_DMA_LOG;
- goto retry;
+ if (err_mask != AC_ERR_TIMEOUT)
+ goto retry;
}
or:
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index e74ab6c0f1a0..2fa03b7573ac 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -2035,7 +2035,8 @@ unsigned int ata_read_log_page(struct ata_device *dev, u8 log,
if (err_mask) {
if (dma) {
dev->horkage |= ATA_HORKAGE_NO_DMA_LOG;
- goto retry;
+ if (!(dev->link->ap->pflags & ATA_PFLAG_FROZEN))
+ goto retry;
}
would be in order, so that we actually print the real error, instead of a bogus
AC_ERR_SYSTEM (returned by ata_exec_internal_sg()) when the port is frozen.
Out of curiosity, which arm64 SoC is this?Yeah, it just looks to be the longstanding issue of using this card on myata_do_link_abort() calls ata_eh_set_pending() without activating fast drain:Might come from the command timeout. John has had many problems with the
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/ata/libata-eh.c?h=v6.0#n989
So I'm not sure why your port is frozen.
(The fast drain timer does freeze the port, but it shouldn't be enabled.)
It might be worthwhile to see who freezes the port in your case.
pm80xx HBA in his Arm machine from a while back. Likely not a driver issue
but a hw one... No-one seems to be able to recreate the same problem.
We need to try the HBA on our Arm board to see what happens.
arm64 machine - that is that I get IO timeouts quite regularly. I should
have mentioned that yesterday. This just seems to be a driver issue.
While it is very unlikely that this is your problem, but I've encountered
an issue on an ARM board before, where the PCIe controller was incorrectly
configured in device tree, causing the controller to miss interrrupts,
which presented itself to the user as timeouts in the WiFi driver:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=97131f85c08e024df49480ed499aae8fb754067f