Re: possible esata regression in 2.6.35

From: Gwendal Grignou
Date: Fri Aug 27 2010 - 18:56:15 EST


I found the problem: in ata_sff_pio_stack:

struct ata_queued_cmd *qc = ap->port_task_data;

has been replaced by:

/* qc can be NULL if timeout occurred */
qc = ata_qc_from_tag(ap, ap->link.active_tag);
if (!qc)
return;

That does not work in case of port multipler, because the link to look
at is not ap->link. ap->link.active_tag is ATA_POISON.

I will submit a patch where I re-introduce port_task_data, this time
containing the link to to look at.

Gwendal.



On Fri, Aug 27, 2010 at 12:29 PM, Gwendal Grignou <gwendal@xxxxxxxxxx> wrote:
> I include the trace. You need windows and a lecroy satasuite to look
> at the trace.
>
> Looking at extended traces, the identify never move in HSM:
>
> Aug 27 12:07:53 halab-59 kernel: [  548.613529] ata_sff_flush_pio_task: ENTER
> Aug 27 12:07:53 halab-59 kernel: [  548.613561] ata_sff_exec_command:
> ata13: cmd 0xE4
> Aug 27 12:07:53 halab-59 kernel: [  548.613577] ata_sff_hsm_move:
> ata13: protocol 1 task_state 3 (dev_stat 0x50)
> Aug 27 12:07:53 halab-59 kernel: [  548.613580] ata_sff_hsm_move:
> ata13: dev 0 command complete, drv_stat 0x50
> Aug 27 12:07:53 halab-59 kernel: [  548.613605] ata_sff_flush_pio_task: ENTER
> Aug 27 12:07:53 halab-59 kernel: [  548.613637] ata_sff_exec_command:
> ata13: cmd 0xE4
> Aug 27 12:07:53 halab-59 kernel: [  548.613654] ata_sff_hsm_move:
> ata13: protocol 1 task_state 3 (dev_stat 0x50)
> Aug 27 12:07:53 halab-59 kernel: [  548.613656] ata_sff_hsm_move:
> ata13: dev 0 command complete, drv_stat 0x50
> Aug 27 12:07:53 halab-59 kernel: [  548.613681] ata_sff_flush_pio_task: ENTER
> Aug 27 12:07:53 halab-59 kernel: [  548.613688]
> ata_eh_revalidate_and_attach: ENTER
> Aug 27 12:07:53 halab-59 kernel: [  548.613691]
> ata_eh_revalidate_and_attach: ENTER
> Aug 27 12:07:53 halab-59 kernel: [  548.613693]
> ata_eh_revalidate_and_attach: ENTER
> Aug 27 12:07:53 halab-59 kernel: [  548.613694]
> ata_eh_revalidate_and_attach: ENTER
> Aug 27 12:07:53 halab-59 kernel: [  548.613725] ata_sff_exec_command:
> ata13: cmd 0xEC
> ...
> Aug 27 12:08:23 halab-59 kernel: [  578.613048] __ata_port_freeze:
> ata13 port frozen
> Aug 27 12:08:23 halab-59 kernel: [  578.613058] ata13.03: qc timeout (cmd 0xec)
> Aug 27 12:08:23 halab-59 kernel: [  578.613062] ata13.03: failed to
> IDENTIFY (I/O error, err_mask=0x4)
>
>
> On Fri, Aug 27, 2010 at 1:19 AM, Gwendal Grignou <gwendal@xxxxxxxxxx> wrote:
>> I can reproduce the problem on uptsream-linux using a PC with Marvell
>> 7042 controller and Sil3726 PMP. Without the SIl3726, it works fine.
>>
>> What I can see on the SATA analyzer [I will send clean trace tomorrow]
>> is the disk send the DATA FIS back to the PMP, but the PMP does not
>> manage to have the data accepted by the host.
>>
>> Non data commands work fine.
>>
>> In dmesg:
>> [10058.404047] ata29: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> [10058.404742] ata29.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6
>> ports, feat 0x1/0x9
>> [10058.405250] ata29.00: hard resetting link
>> [10058.809151] ata29.00: link resume succeeded after 1 retries
>> [10058.911613] ata29.01: hard resetting link
>> [10059.217572] ata29.02: hard resetting link
>> [10059.523572] ata29.03: hard resetting link
>> [10059.829505] ata29.04: hard resetting link
>> [10060.134572] ata29.05: hard resetting link
>> [10065.440079] ata29.03: qc timeout (cmd 0xec)
>> [10065.440085] ata29.03: failed to IDENTIFY (I/O error, err_mask=0x4)
>> [10065.440092] ata29.15: hard resetting link
>> [10065.947049] ata29.15: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> [10065.947737] ata29.00: hard resetting link
>> [10066.352106] ata29.00: link resume succeeded after 1 retries
>> [10066.454616] ata29.01: hard resetting link
>> [10066.760591] ata29.02: hard resetting link
>> [10067.066570] ata29.03: hard resetting link
>> [10067.372505] ata29.05: hard resetting link
>> [10077.677071] ata29.03: qc timeout (cmd 0xec)
>> [10077.677076] ata29.03: failed to IDENTIFY (I/O error, err_mask=0x4)
>> [10077.677081] ata29.15: hard resetting link
>> [10078.184073] ata29.15: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> [10078.184783] ata29.00: hard resetting link
>> [10078.589185] ata29.00: link resume succeeded after 1 retries
>> [10078.691593] ata29.01: hard resetting link
>> [10078.997592] ata29.02: hard resetting link
>> [10079.303571] ata29.03: hard resetting link
>> [10079.609505] ata29.04: hard resetting link
>> [10079.914572] ata29.05: hard resetting link
>> [10110.220078] ata29.03: qc timeout (cmd 0xec)
>> [10110.220083] ata29.03: failed to IDENTIFY (I/O error, err_mask=0x4)
>> [10110.220087] ata29.03: failed to recover link after 3 tries, disabling
>> [10110.220094] ata29.15: hard resetting link
>> [10110.727044] ata29.15: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> [10111.033375] ata29.00: hard resetting link
>> [10111.338571] ata29.01: hard resetting link
>> [10111.644591] ata29.02: hard resetting link
>> [10111.950594] ata29.05: hard resetting link
>> [10112.256643] ata29: EH complete
>>
>>
>> Gwendal.
>>
>>
>>
>>
>> On Sun, Aug 22, 2010 at 12:57 PM, Jeff Garzik <jeff@xxxxxxxxxx> wrote:
>>> On 08/22/2010 03:54 PM, Jeff Garzik wrote:
>>>>
>>>> On 08/21/2010 02:52 PM, Nicolas Jungers wrote:
>>>>>
>>>>> My arm box doesn't succeed to use my esata port multiplier (addonics
>>>>> sil3726 based). It was working well with 2.6.34.1 and 2.6.34.4 but not
>>>>> with both 2.6.35.2 and 2.6.35.3. I haven't test other kernels.
>>>>>
>>>>> The kernels are from http://sheeva.with-linux.com/sheeva/ with for
>>>>> example the following config
>>>>> http://sheeva.with-linux.com/sheeva/2.6.35.3/sheeva-2.6.35.3.config
>>>>>
>>>>> The symptoms are in the console a loop on the esata links. Here is the
>>>>> start of it:
>>>>>
>>>>> ata2: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
>>>>> ata2: edma_err_cause=00000010 pp_flags=00000000, dev connect
>>>>> ata2: SError: { PHYRdyChg DevExch }
>>>>> ata2: hard resetting link
>>>>> ata2: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
>>>>> ata2.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
>>>>
>>>> Can you post or link to the entire dmesg?
>>>>
>>>> Notably, we need to see the probe messages to determine what SATA chip
>>>> you are using... From the edma_err_cause config I'd guess sata_mv, but
>>>> more info would be useful.
>>>
>>> Nevermind, I see the reply (it got auto-sorted into the wrong folder
>>> locally).
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/