Re: [Regression] commit 8a4aeec8d(libata/ahci: accommodate tag ordered controllers)

From: Loc Ho
Date: Thu Jun 05 2014 - 16:39:20 EST


Hi Tejun,

>> >> Looks the commit 8a4aeec8d(libata/ahci: accommodate tag ordered
>> >> controllers) causes below sata failure[1] on APM AHCI controller.
>> >>
>> >> And the error does disappear after reverting the commit.
>> >
>> > Can you give some more details about the workload and the disk? What
>> > is the APM AHCI?
>>
>> The commit causes the APM arm64 based system not bootable, and not
>> special workload, just when mounting rootfs or reading init files.
>>
>> APM AHCI:
>> drivers/ata/ahci_xgene.c
>
> Urgh... so, the controller can't handle tags allocated in round-robin
> instead of lowest-first? Ming, can you please test with different
> controllers / disks / ssds and see whether the problem stays with the
> controller? Loc, can you guys please look into it so that at least
> the next revision hardware can fix the issue?
>
> Provided that the tag allocation itself isn't broken, which isn't too
> likely given the lack of bug reports on other controllers, we'll need
> to blacklist ahci_xgene somehow. Either disable NCQ support on it or
> implement an alternative lowest-first tag allocation for it. If this
> actually is the controller freaking out about tags allocated in a
> different order, I'm not sure how much confidence we can put in its
> NCQ implementation tho and it might be a better idea to simply plug
> NCQ support on it until we understand the details of the problem.
>
> So, something like the following? Ming, can you please verify whether
> the following makes the machine boot again?
>
> Thanks.
>
> diff --git a/drivers/ata/ahci_xgene.c b/drivers/ata/ahci_xgene.c
> index 77c89bf..4950600 100644
> --- a/drivers/ata/ahci_xgene.c
> +++ b/drivers/ata/ahci_xgene.c
> @@ -302,9 +302,16 @@ static struct ata_port_operations xgene_ahci_ops = {
> .read_id = xgene_ahci_read_id,
> };
>
> +/*
> + * The controller seems to freak out depending on the order of tag
> + * allocation. NCQ support is disabled until the problem is better
> + * understood.
> + *
> + * http://lkml.kernel.org/g/CACVXFVMnM6w5ZLX=S8fJHeVoEqxH2JG3Wg=GPMeoVnOpAjsvAA@xxxxxxxxxxxxxx
> + */
> static const struct ata_port_info xgene_ahci_port_info = {
> - AHCI_HFLAGS(AHCI_HFLAG_NO_PMP | AHCI_HFLAG_YES_NCQ),
> - .flags = AHCI_FLAG_COMMON | ATA_FLAG_NCQ,
> + AHCI_HFLAGS(AHCI_HFLAG_NO_PMP),
> + .flags = AHCI_FLAG_COMMON,
> .pio_mask = ATA_PIO4,
> .udma_mask = ATA_UDMA6,
> .port_ops = &xgene_ahci_ops,

Suman is very familiar with this and he will proposal an solution.
Please be aware that this is only an issue with previous revision of
the chip. With the latest revision in house, this is not an issue.

-Loc
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/