Re: [Regression] commit 8a4aeec8d(libata/ahci: accommodate tag ordered controllers)

From: Tejun Heo
Date: Thu Jun 05 2014 - 09:41:30 EST


Hello,

(cc'ing ahci_xgene folks)

On Thu, Jun 05, 2014 at 09:24:04PM +0800, Ming Lei wrote:
> On Thu, Jun 5, 2014 at 9:12 PM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
> > On Thu, Jun 5, 2014 at 1:53 AM, Ming Lei <ming.lei@xxxxxxxxxxxxx> wrote:
> >> Hi Dan and Tejun,
> >>
> >> Looks the commit 8a4aeec8d(libata/ahci: accommodate tag ordered
> >> controllers) causes below sata failure[1] on APM AHCI controller.
> >>
> >> And the error does disappear after reverting the commit.
> >
> > Can you give some more details about the workload and the disk? What
> > is the APM AHCI?
>
> The commit causes the APM arm64 based system not bootable, and not
> special workload, just when mounting rootfs or reading init files.
>
> APM AHCI:
> drivers/ata/ahci_xgene.c

Urgh... so, the controller can't handle tags allocated in round-robin
instead of lowest-first? Ming, can you please test with different
controllers / disks / ssds and see whether the problem stays with the
controller? Loc, can you guys please look into it so that at least
the next revision hardware can fix the issue?

Provided that the tag allocation itself isn't broken, which isn't too
likely given the lack of bug reports on other controllers, we'll need
to blacklist ahci_xgene somehow. Either disable NCQ support on it or
implement an alternative lowest-first tag allocation for it. If this
actually is the controller freaking out about tags allocated in a
different order, I'm not sure how much confidence we can put in its
NCQ implementation tho and it might be a better idea to simply plug
NCQ support on it until we understand the details of the problem.

So, something like the following? Ming, can you please verify whether
the following makes the machine boot again?

Thanks.

diff --git a/drivers/ata/ahci_xgene.c b/drivers/ata/ahci_xgene.c
index 77c89bf..4950600 100644
--- a/drivers/ata/ahci_xgene.c
+++ b/drivers/ata/ahci_xgene.c
@@ -302,9 +302,16 @@ static struct ata_port_operations xgene_ahci_ops = {
.read_id = xgene_ahci_read_id,
};

+/*
+ * The controller seems to freak out depending on the order of tag
+ * allocation. NCQ support is disabled until the problem is better
+ * understood.
+ *
+ * http://lkml.kernel.org/g/CACVXFVMnM6w5ZLX=S8fJHeVoEqxH2JG3Wg=GPMeoVnOpAjsvAA@xxxxxxxxxxxxxx
+ */
static const struct ata_port_info xgene_ahci_port_info = {
- AHCI_HFLAGS(AHCI_HFLAG_NO_PMP | AHCI_HFLAG_YES_NCQ),
- .flags = AHCI_FLAG_COMMON | ATA_FLAG_NCQ,
+ AHCI_HFLAGS(AHCI_HFLAG_NO_PMP),
+ .flags = AHCI_FLAG_COMMON,
.pio_mask = ATA_PIO4,
.udma_mask = ATA_UDMA6,
.port_ops = &xgene_ahci_ops,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/