RE: Regression on Dell XPS13 (was: [char-misc for 4.10-rc4 V2] mei: bus: enable OS version only for SPT and newer)

From: Mario.Limonciello
Date: Tue Jan 17 2017 - 11:58:57 EST


Hi Paul,

Thanks for raising this topic and including me.
Suspend to Idle support in Linux as an alternative S3 on x86 is a new
topic. In all, I expected that some problems would arise as a result
of this patch, and I hope they will spur interesting discussions and
solutions.

Something that might not be immediately obvious is that by supporting
suspend to idle, new driver bugs are going to be identified for drivers
that don't properly put hardware into low enough power modes for the CPU
to go into the proper state. Traditionally with S3 the BIOS would be responsible
for powering devices down before S3 and back up when exiting S3.
This onus is now on the OS.

> -----Original Message-----
> From: Paul Menzel [mailto:pmenzel@xxxxxxxxxxxxx]
> Sent: Tuesday, January 17, 2017 8:34 AM
> To: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>; Limonciello, Mario
> <Mario_Limonciello@xxxxxxxx>
> Cc: Thorsten Leemhuis <linux@xxxxxxxxxxxxx>; Greg Kroah-Hartman
> <gregkh@xxxxxxxxxxxxxxxxxxx>; Tomas Winkler <tomas.winkler@xxxxxxxxx>;
> Jan Niehusmann <jan@xxxxxxxxxx>; Usyskin, Alexander
> <alexander.usyskin@xxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx; Chen, Yu C
> <yu.c.chen@xxxxxxxxx>; Sarvela, Tomi P <tomi.p.sarvela@xxxxxxxxx>; Daniel
> Blueman <daniel@xxxxxxxxx>; Len Brown <len.brown@xxxxxxxxx>; linux-
> pm@xxxxxxxxxxxxxxx
> Subject: Regression on Dell XPS13 (was: [char-misc for 4.10-rc4 V2] mei: bus:
> enable OS version only for SPT and newer)
>
> Dear Rafael, dear Mario,
>
>
> On 01/17/17 09:14, Thorsten Leemhuis wrote:
> > Greg Kroah-Hartman wrote on 16.01.2017 12:05:
> >> On Sun, Jan 15, 2017 at 11:58:30AM +0100, Greg Kroah-Hartman wrote:
> >>> On Sun, Jan 15, 2017 at 07:19:03AM +0000, Winkler, Tomas wrote:
> >>>>> Subject: Re: [char-misc for 4.10-rc4 V2] mei: bus: enable OS
> >>>>> version only for SPT and newer On Sat, Jan 14, 2017 at 08:27:31PM
> >>>>> +0100, Paul Menzel wrote:
> >>>>>> On 2017-01-13 14:00, Greg Kroah-Hartman wrote:
> >>>>>>> On Wed, Jan 11, 2017 at 03:26:06PM +0100, Paul Menzel wrote:
> >>>>>>>> On 01/11/17 15:12, Winkler, Tomas wrote:
> >>>>>>>>>>> On 01/11/17 10:24, Winkler, Tomas wrote:
> >>>>>>>>>>>>> On Wed, Jan 11, 2017 at 01:27:21AM +0200, Tomas Winkler
> wrote:
> >>>>>>>>>>>>>> On older platforms the command should be just ignored
> by
> >>>>>>>>>>>>>> the firmware but some older platforms misbehave so it's
> >>>>>>>>>>>>>> safer to send the command only if required.
> >>>>>>>>>>>>> Thanks! This fixes suspend-to-ram for me (on a Thinkpad
> x201s).
> >>>>>>>>>>>> What about Dell XPS13?
>
> There is a regression with Linux 4.10-rc{1,2,3} on the Intel Kaby Lake device
> Dell XPS13 (9360). Hitting the power button, the light of the power button
> never goes off. Pressing it again, nothing happen. Only pressing it like ten
> seconds the screen comes back for five seconds, and then it seems to try to
> suspend again.


When it's in suspend to idle, the EC will modify some behaviors that align
to vendor specifications. These include both the power button and
the power light. Both are actually expected, but the power button is
indicative of other problems in the stack that need work.
Let me explain how it all works:

When in suspend to idle, if the power button is pressed only momentarily (~ <6s),
the EC sends a wakeup event to the firmware which sends it to the OS
through an ACPI event to the Intel HID event filter driver (intel-hid).

If it's pressed for longer (~6-9 seconds) then an IRQ is triggered to wake up
the system the traditional way through a power button press.

If the button is pressed for a very long time (~10+ seconds) then
the system will be force powered off.

So in the <6s scenario, the intel-hid driver is responsible to receive the ACPI
event and process accordingly. The maintainer has a patch ready for the intel-hid
portion of this work, but it's currently being reviewed by Intel to ensure it
can be legally submitted into the kernel.

After that patch is approved for disclosure, submitted and accepted, the other
missing part is that the ACPI subsystem itself is frozen when in suspend to idle,
so this event can't be received by intel-hid. This part needs discussion on how to fix.

In the 6-9s scenario there are no problems as this is the traditional wakeup
(as you found).

>
> >>>>>>>>>>> With Linus' master branch from today, and Greg's
> >>>>>>>>>>> char-misc-linus merged
> >>>>>>>>>>> (Merge: 807b93e995d1 546cf3ef9c92), the regression is still
> there.
> >> [â]
> >>>>>>>> I am currently bisecting to find the culprit. 13 steps will
> >>>>>>>> take some time though.
> >>>>>>> I can duplicate this on my laptop here as well :(
> >>>>>> Which system do you have?
> >>>>> A Dell XPS13, don't know what cpu type it is, here's the output of
> >>>>> one cpu from /proc/cpuinfo
> >> [â]
> >>>>> model name : Intel(R) Core(TM) i7-6560U CPU @ 2.20GHz
> >>>>>>> Did you get anywhere with your bisection?
> >>>>>> Sorry, I replied to a different message with my status.
> >>>> You are close! I'll try bisection tomorrow if I have some spare time.
> >
> > Paul, did you get any closer? I have trouble finding time for a proper
> > bisection (it's a bit chaotic here right now :-/ )
> >
> >>>> Greg, is that same Laptop mode as Paul's, you've experience the issue
> on?
> >>> It's the same model name, but as this model has been shipped with
> >>> many different CPU versions over the years, I'm not sure if it is
> >>> the exact same one.
> >
> > Gregs afaics is Skylake generation (XPS 13 (9350)), Paul and I have a
> > Kaby Lake (XPS 13 (9360)), which is one generation newer (Wifi is also
> > different; other components likely as well).
> >
> >> Ok, 4.10-rc4 seems to have fixed this issue with me. I don't know
> >> what it was, but I can't duplicate it anymore.
> >
> > Good to hear.
> >
> >> Paul, are you still having this issue?
> >
> > Don't know about Paul, but I did a quick test with rc4 on my machine
> > and the issue is still there :-/
>
> I didnât test Linux 4.10-rc4 yet, but I completed the bisection.
>
> ```
> 406e79385f3223d82272cf2be86bc95cd000a258 is the first bad commit commit
> 406e79385f3223d82272cf2be86bc95cd000a258
> Author: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> Date: Mon Nov 21 22:45:40 2016 +0100
>
> PM / sleep: System sleep state selection interface rework
>
> There are systems in which the platform doesn't support any special
> sleep states, so suspend-to-idle (PM_SUSPEND_FREEZE) is the only
> available system sleep state. However, some user space frameworks
> only use the "mem" and (sometimes) "standby" sleep state labels, so
> the users of those systems need to modify user space in order to be
> able to use system suspend at all and that may be a pain in practice.
>
> Commit 0399d4db3edf (PM / sleep: Introduce command line argument for
> sleep state enumeration) attempted to address this problem by adding
> a command line argument to change the meaning of the "mem" string in
> /sys/power/state to make it trigger suspend-to-idle (instead of
> suspend-to-RAM).
>
> However, there also are systems in which the platform does support
> special sleep states, but suspend-to-idle is the preferred one anyway
> (it even may save more energy than the platform-provided sleep states
> in some cases) and the above commit doesn't help in those cases.
>
> For this reason, rework the system sleep state selection interface
> again (but preserve backwards compatibiliby). Namely, add a new
> sysfs file, /sys/power/mem_sleep, that will control the system
> suspend mode triggered by writing "mem" to /sys/power/state (in
> analogy with what /sys/power/disk does for hibernation). Make it
> select suspend-to-RAM ("deep" sleep) by default (if supported) and
> fall back to suspend-to-idle ("s2idle") otherwise and add a new
> command line argument, mem_sleep_default, allowing that default to
> be overridden if need be.
>
> At the same time, drop the relative_sleep_states command line
> argument that doesn't make sense any more.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> Tested-by: Mario Limonciello <mario.limonciello@xxxxxxxx>
>
> :040000 040000 a5770fe0413cbe7794eab28f72dfe8ede1f090c2
> a2882c77659517aa7137c930e0e7f178bc76bfbd M Documentation
> :040000 040000 2594b1a87815173e97199ef9d9c5918fec22fcfd
> fe0b69953be653644c5366ac631131fdbbdb9bcc M kernel
> $ git tag --contains 406e79385f3223d82272cf2be86bc95cd000a258
> v4.10-rc1
> v4.10-rc2
> v4.10-rc3
> ```
>
> Please find the config and the bisection log attached. Sometimes I had to
> cherry-pick build fix commits for PIE enabled GCC on top.
>
> Rafael, do you want me to open a bug report for that? Mario, what systems
> did you actually test this on? (Why isnât that listed in the commit message?)
> Mario, do you have access to Dell XPS13 (9360) devices to help getting this
> fixed?
>

I tested this on several systems and ensured that the kernel was doing the
right thing in terms of choosing the correct state to go into.

As I mentioned above, those behaviors are currently expected until these
types issues are identified and fixed in the proper subsystems. If they
end up being troublesome to resolve, it's possible to quirk individual systems,
to disable this behavior and return to traditional S3 but I would prefer to actually
identify and fix the various problems so that we can push the Linux stack forward.