Re: [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts

From: Martin Steigerwald
Date: Wed Mar 14 2018 - 08:48:56 EST


Hans de Goede - 14.03.18, 12:05:
> Hi,
>
> On 14-03-18 12:01, Martin Steigerwald wrote:
> > Hans de Goede - 11.03.18, 15:37:
> >> Hi Martin,
> >>
> >> On 11-03-18 09:20, Martin Steigerwald wrote:
> >>> Hello.
> >>>
> >>> Since 4.16-rc4 (upgraded from 4.15.2 which worked) I have an issue
> >>> with SMART checks occassionally failing like this:
> >>>
> >>> smartd[28017]: Device: /dev/sdb [SAT], is in SLEEP mode, suspending
> >>> checks
> >>> udisksd[24408]: Error performing housekeeping for drive
> >>> /org/freedesktop/UDisks2/drives/INTEL_SSDSA2CW300G3_[â]: Error updating
> >>> SMART data: Error sending ATA command CHECK POWER MODE: Unexpected sense
> >>> data returned:#0120000: 0e 09 0c 00 00 00 ff 00 00 00 00 00 00 00 50
> >>> 00 ..............P.#0120010: 00 00 00 00 00 00 00 00 00 00 00 00
> >>> 00
> >>> 00 00 00 ................#012 (g-io-error-quark, 0) merkaba
> >>> udisksd[24408]: Error performing housekeeping for drive
> >>> /org/freedesktop/UDisks2/drives/Crucial_CT480M500SSD3_[â]: Error
> >>> updating
> >>> SMART dat a: Error sending ATA command CHECK POWER MODE: Unexpected
> >>> sense
> >>> data returned:#0120000: 01 00 1d 00 00 00 0e 09 0c 00 00 00 ff 00 00
> >>> 00 ................#0120010: 00 0 0 00 00 50 00 00 00 00 00 00 00
> >>> 00 00 00 00 ....P...........#012 (g-io-error-quark, 0)
> >>>
> >>> (Intel SSD is connected via SATA, Crucial via mSATA in a ThinkPad T520)
> >>>
> >>> However when I then check manually with smartctl -a | -x | -H the device
> >>> reports SMART data just fine.
> >>>
> >>> As smartd correctly detects that device is in sleep mode, this may be an
> >>> userspace issue in udisksd.
> >>>
> >>> Also at some boot attempts the boot hangs with a message like "could not
> >>> connect to lvmetad, scanning manually for devices". I use BTRFS RAID 1
> >>> on to LVs (each on one of the SSDs). A configuration that requires a
> >>> manual
> >>> adaption to InitRAMFS in order to boot (basically vgchange -ay before
> >>> btrfs device scan).
> >>>
> >>> I wonder whether that has to do with the new SATA LPM policy stuff, but
> >>> as
> >>> I had issues with
> >>>
> >>> 3 => Medium power with Device Initiated PM enabled
> >>>
> >>> (machine did not boot, which could also have been caused by me
> >>> accidentally
> >>> removing all TCP/IP network support in the kernel with that setting)
> >>>
> >>> I set it back to
> >>>
> >>> CONFIG_SATA_MOBILE_LPM_POLICY=0
> >>>
> >>> (firmware settings)
> >>
> >> Right, so at that settings the LPM policy changes are effectively
> >> disabled and cannot explain your SMART issues.
> >>
> >> Still I would like to zoom in on this part of your bug report, because
> >> for Fedora 28 we are planning to ship with
> >> CONFIG_SATA_MOBILE_LPM_POLICY=3
> >> and AFAIK Ubuntu has similar plans.
> >>
> >> I suspect that the issue you were seeing with
> >> CONFIG_SATA_MOBILE_LPM_POLICY=3 were with the Crucial disk ? I've
> >> attached
> >> a patch for you to test, which disabled LPM for your model Crucial SSD
> >> (but
> >> keeps it on for the Intel disk) if you can confirm that with that patch
> >> you
> >> can run with
> >> CONFIG_SATA_MOBILE_LPM_POLICY=3 without issues that would be great.
> >
> > With 4.16-rc5 with CONFIG_SATA_MOBILE_LPM_POLICY=3 the system successfully
> > booted three times in a row. So feel free to add tested-by.
>
> Thanks.
>
> To be clear, you're talking about 4.16-rc5 with the patch I made to
> blacklist the Crucial disk I assume, not just plain 4.16-rc5, right ?

4.16-rc5 with your

0001-libata-Apply-NOLPM-quirk-to-Crucial-M500-480GB-SSDs.patch

patch.

Thanks,
--
Martin