Re: [REGRESSION][BISECTED] BU40N Blu-Ray drive broken since 7627a0edef54

From: Niklas Cassel
Date: Mon Jan 13 2025 - 11:29:10 EST


Hello Christian,

On Sat, Jan 11, 2025 at 05:41:10PM +0100, Christian Heusel wrote:
> On 25/01/10 05:21PM, Christian Heusel wrote:
> > On 25/01/10 12:24PM, Niklas Cassel wrote:
> > > On Fri, Jan 10, 2025 at 10:04:46AM +0100, Christian Heusel wrote:
> > >
> > > > diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> > > > index c085dd8..ef01ccd 100644
> > > > --- a/drivers/ata/libata-core.c
> > > > +++ b/drivers/ata/libata-core.c
> > > > @@ -4118,6 +4118,9 @@ static const struct ata_dev_quirks_entry __ata_dev_quirks[] = {
> > > > { "SAMSUNG MZ7TD256HAFV-000L9", NULL, ATA_QUIRK_NOLPM },
> > > > { "SAMSUNG MZ7TE512HMHP-000L1", "EXT06L0Q", ATA_QUIRK_NOLPM },
> > > >
> > > > + /* Hitachi-LG Data Storage models with LPM issues */
> > > > + { "HL-DT-ST BD-RE BU40N", NULL, ATA_QUIRK_NOLPM },
> > > > +
> > > > /* devices that don't properly handle queued TRIM commands */
> > > > { "Micron_M500IT_*", "MU01", ATA_QUIRK_NO_NCQ_TRIM |
> > > > ATA_QUIRK_ZERO_AFTER_TRIM },
> > > >
> > > > So if anyone has feedback on why the patch does not work or any
> > > > alternative ideas for a solution that would be highly appreciated!
>
> So weirdly the quirk from the patch for the same kernel _does work_ when
> the user passes the kernel parameter "ahci.mobile_lpm_policy=1":
>
> [ 24.035361] ata4.00: Model 'HL-DT-ST BD-RE BU40N', rev '1.03', applying quirks: nolpm
>
> Any idea why that could be? Is it maybe not the device itself but the
> controller that it connects to that is borked and in need of the quirk?

So from the 6.6.69-1-lts log, which you claim is working, we see this:
[ 0.447942] ata3: SATA max UDMA/133 abar m2048@0xa091a000 port 0xa091a200 irq 126
[ 0.447945] ata4: SATA max UDMA/133 abar m2048@0xa091a000 port 0xa091a280 irq 126
[ 0.447948] ata5: SATA max UDMA/133 abar m2048@0xa091a000 port 0xa091a300 irq 126
[ 0.447950] ata6: SATA max UDMA/133 abar m2048@0xa091a000 port 0xa091a380 irq 126
[ 0.756553] ata6: SATA link down (SStatus 4 SControl 300)
[ 0.759958] ata4: SATA link down (SStatus 0 SControl 300)
[ 0.760021] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 0.760084] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

Since this is running AHCI, default reset is ahci_do_hardreset() (COMRESET).

The SATA link up/link down messages are printed after resetting each port.
We can see that we get link up on ata3 and ata5, but link is down on ata6
and ata4.

Seeing link down after a reset is not correct, and I think we should first
focus on seeing link up for ata4, before knowing if we also need a quirk or
not.


[ 0.762876] ata5.00: LPM support broken, forcing max_power
[ 0.763000] ata5.00: supports DRM functions and may not be fully accessible
[ 0.763007] ata5.00: ATA-9: SAMSUNG MZ7TD256HAFV-000L9, DXT04L6Q, max UDMA/133
[ 0.763242] ata5.00: NCQ Send/Recv Log not supported
[ 0.763248] ata5.00: 500118192 sectors, multi 1: LBA48 NCQ (depth 32), AA
[ 0.765038] ata5.00: Features: Trust Dev-Sleep
[ 0.765349] ata5.00: LPM support broken, forcing max_power
[ 0.765471] ata5.00: supports DRM functions and may not be fully accessible
[ 0.765714] ata5.00: NCQ Send/Recv Log not supported
[ 0.766391] ata3.00: ATA-8: WDC WD20EARX-00PASB0, 51.0AB51, max UDMA/133
[ 0.766999] ata3.00: 3907029168 sectors, multi 16: LBA48 NCQ (depth 32), AA
[ 0.767029] ata5.00: configured for UDMA/133
[ 0.773655] ata3.00: configured for UDMA/133
[ 81.230713] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 81.235708] ata4.00: ATAPI: HL-DT-ST BD-RE BU40N, 1.03, max UDMA/133
[ 81.240789] ata4.00: configured for UDMA/133

After 81 seconds! we see link up on ata4...

Looking at your dmesg-6.13-rc6-mainline.log, we see similar result,
link down at initial scan, and then link up way, way later.

Right now I don't know why your device actually manages to get link up after
so long. Most likely this is from a suprious hotplug event, which can happen
when LPM is enabled in the device, but not masked in the controller.
(Linux only masks the interrupt if a LPM policy other than Max Power is set,
so it will have been enabled on v6.6.)

But like I said, we should first focus on getting link up after a reset.


I think the problem might be one out of two things:

1) The device is already in DevSleep state before the reset.
While resetting the port will bring the device out of DevSleep,
I found the following in AHCI 1.3.1 specification:

"""
Note: It is recommended that software explicitly set PxCMD.ICC to ‘1h’
and wait PxDEVSLP.DETO + PxDEVSLP.MDAT + 1ms before initiating a COMRESET.
If software initiates a COMRESET to exit from the DevSleep interface state,
PxSCTL.DET shall be set to ‘1h’ (at a minimum) for the amount of time
specified by PxDEVSLP.DETO + PxDEVSLP.MDAT + 1ms.
"""

Right now, Linux does neither set PxCMD.ICC to 1 before the COMRESET,
nor do we wait for PxDEVSLP.DETO + PxDEVSLP.MDAT + 1ms, while PxSCTL.DET in
set to 1. (We currently only hold it for 1ms.)


Could you please try this patch?

diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c
index 9c76fb1ad2ec..b0bff5fe3e03 100644
--- a/drivers/ata/libata-sata.c
+++ b/drivers/ata/libata-sata.c
@@ -660,7 +660,7 @@ int sata_link_hardreset(struct ata_link *link, const unsigned int *timing,
/* Couldn't find anything in SATA I/II specs, but AHCI-1.1
* 10.4.2 says at least 1 ms.
*/
- ata_msleep(link->ap, 1);
+ ata_msleep(link->ap, 20 + 10 + 1);

/* bring link back */
rc = sata_link_resume(link, timing, deadline);



If that works to get link up after reset, you can continue to see if the
LPM quirk is needed or not. (And I will need to write a proper fix so
that we sleep the correct amount of time while holding PxSCTL.DET in 1).


Kind regards,
Niklas