Re: [PATCH] ata: libata-sata: retry hardreset when device detected but PHY not established

From: Damien Le Moal

Date: Mon Apr 27 2026 - 00:46:17 EST


On 4/27/26 10:51 AM, yangxingui wrote:
>
>
> On 2026/4/26 6:53, Damien Le Moal wrote:
>> On 4/25/26 15:04, Xingui Yang wrote:
>>> When sata_link_hardreset() detects that the link is offline, it currently
>>> returns immediately without distinguishing the reason. According to SATA
>>> specification, the SStatus register's det filed (bits 0-3) indicates:
>>>    - 0x0: No device detected, PHY not communicating
>>>    - 0x1: Device detected but PHY communication not established
>>>    - 0x3: Device detected and PHY communication established
>>>
>>> This patch helps improve device detection reliability and adds a check
>>> when the link is offline but det filed shows 0x1, return -EAGAIN to
>>> trigger retry, rather than giving up immediately.
>>>
>>> Signed-off-by: Xingui Yang <yangxingui@xxxxxxxxxx>
>>
>> This is a pure ATA patch so please CC the linux-ide list, not the linux-scsi
>> list.
>
> Ok.
>>
>> Also, please check your mail setup: your email was in my Junk folder.
>
> Well, patche was sent using the git send command.

Not git send-email, your smtp server. It probably has something wrong with
DMARC. All your emails endup in my junk folder.

>> This is preceeded by a call to sata_link_resume(), which calls
>> sata_link_debounce() and that function makes sure that DET is stable. So if
>> after that DET still shows that their is no PHY, there is likely a big problem
>> with it and it is super slow to be established.
>>
>> In this case, I do not think that doing another hardreset is the right thing to
>> do. Have you tried increasing the deadline for hardreset ? That deadline is used
>> as the limit for the link debounce too.
>>
>> Do you have a specific controller/device where you see this issue ? What exactly
>> is the hardware setup where you see this issue ?
>
> Our customer imports and verifies a new disk, there is an occasional failure in
> performing a hard reset on the disk and no exception log is generated for
> resume and debounce.

Does this hold for all disks or for only one or some models ?

>
> [   22.864418][ T1285] ahci 0000:76:03.0: Adding to iommu group 23
> [   22.870403][ T1285] ahci 0000:76:03.0: controller does not support SXS,
> disabling CAP_SXS
> [   22.878655][ T1285] ahci 0000:76:03.0: SSS flag set, parallel bus scan disabled
> [   22.885966][ T1285] ahci 0000:76:03.0: AHCI 0001.0300 32 slots 2 ports 6
> Gbps 0x3 impl SATA mode
> [   22.894743][ T1285] ahci 0000:76:03.0: flags: 64bit ncq sntf stag pm led clo
> only pmp fbs slum part ccc ems boh
> [   22.905277][ T1285] scsi host0: ahci
> [   22.909061][ T1285] scsi host1: ahci
> [   22.966463][ T1285] ata1: SATA max UDMA/133 abar m4096@0xa3010000 port
> 0xa3010100 irq 108
> [   22.974629][ T1285] ata2: SATA max UDMA/133 abar m4096@0xa3010000 port
> 0xa3010180 irq 109
> [   25.242373][ T1286] ata1: SATA link down (SStatus 1 SControl 300)
> <==============
> [   25.659901][ T1288] ata2: SATA link down (SStatus 0 SControl 300)
>>
>>
>>
>>> +        u32 sstatus;
>>> +
>>> +        if (sata_scr_read(link, SCR_STATUS, &sstatus) == 0 &&
>>> +            (sstatus & 0xf) == 0x1) {
>>> +            ata_link_warn(link, "device detected but PHY not ready (SStatus
>>> %X), retrying\n",
>>> +                      sstatus);
>>> +            rc = -EAGAIN;
>>> +        }
>>> +
>>>           goto out;
>>> +    }
>>>         /* Link is online.  From this point, -ENODEV too is an error. */
>>>       if (online)
>>
>>
>
> Thanks,
> Xingui
>


--
Damien Le Moal
Western Digital Research