RE: [PATCH v2 3/3] ACPI / button: Send "open" state after boot/resume

From: Zheng, Lv
Date: Tue May 31 2016 - 21:17:27 EST


> From: Benjamin Tissoires [mailto:benjamin.tissoires@xxxxxxxxx]
> Subject: Re: [PATCH v2 3/3] ACPI / button: Send "open" state after
> boot/resume
> Hi Lv,
> On Tue, May 31, 2016 at 4:55 AM, Zheng, Lv <lv.zheng@xxxxxxxxx> wrote:
> > Hi,
> >
> >> From: Benjamin Tissoires [mailto:benjamin.tissoires@xxxxxxxxx]
> >> Subject: Re: [PATCH v2 3/3] ACPI / button: Send "open" state after
> >> boot/resume
> >>
> >> On Fri, May 27, 2016 at 9:16 AM, Lv Zheng <lv.zheng@xxxxxxxxx>
> wrote:
> [snipped
> ]>> As Valdis replied on 0/3, I don't think this is a good solution (even
> >> temporary). Linux should not assume the current state of a input
> >> device, and sending unconditionally 1 here is wrong. If the device is
> >> on a docking station, you will wake up the wrong monitor and screw
> the
> >> user session (and this will be a regression).
> > [Lv Zheng]
> > We are doing the test to see how this behaves on several different
> platforms.
> >
> >>
> >> How about we simply send the current LID state stored in the ACPI?
> >> something like calling acpi_lid_send_state() directly?
> > [Lv Zheng]
> > This is what we are going to eliminate in [PATCH 01].
> > We have several real bugs related to sending a wrong state to the
> userspace.
> > Userspace will suspend right after resume because of the 'close' state.
> On the other hand, you are trying to remove 23de5d9ef2a4bbc4f733f, a
> patch that has been around for 9 years and we only start seeing
> devices where this logic is not working...
> I am not saying your approach is wrong, I am just saying that instead
> of a plain revert, we should probably be more conservative and add a
> quirk for those buggy machines. Ideally, we should try to understand
> why there is such an issue that Windows doesn't have (the solution
> might just be that given Windows doesn't care, we are screwed).
> BTW, on the Surface 3, there is a WMI
> (f7cc25ec-d20b-404c-8903-0ed4359c18ae -> WQHE) which returns the
> actual value of the LID, without using PNP0C0D at all. I have a
> feeling that Windows might use it when it is in trouble or in an
> unsure state. I couldn't find this WMI on the 2 other systems so that
> may also be just a one shot for the Surface 3.
[Lv Zheng]
Thanks for the information.
But it seems Surface 3 is not a good example for this.
It is a runtime idle platform.
And the root cause of the Surface 3 issue should be in the freeze code.
After waking the system up via LID irq, the irq is dropped.
But I guess it is risky to invoke irq handler right there, doing so could break may existing drivers.

> [snipped]
> > [Lv Zheng]
> > The understanding here is incorrect.
> > We have 3 bogus devices.
> > 1 of them is surface 3 which is a hardware reduced platform.
> > The others are all traditional platforms.
> >
> > =====
> > The facts are:
> >
> > Both the platforms return cached lid state from _LID.
> > The cached value will be updated by lid irq (via GPIO IRQ, GPE, or EC
> event).
> > AML tables will send lid notification in the irq handler.
> >
> > Some AML tables will update the cached value in _WAK (I'll describe why
> it is necessary below).
> > But updating the cached value in _WAK is not guaranteed by all AML
> tables.
> >
> > For the 'close' state irq, all tables will send lid close notification.
> > For the 'open' state irq, it seems there are tables never sending lid open
> notification (sounds like Windows do not care about lid open).
> > =====
> >
> > Surface 3 is entirely a different case.
> > It is a runtime idle system and hardware reduced.
> > On that kind of system, lid open is handled by OS not by BIOS.
> > Surface 3 is exactly the platform that doesn't send lid open notification.
> > I guess the AML is intentionally written in this way to be compliant to
> the traditional platforms.
> >
> > While on the traditional platforms:
> > When lid is opened, BIOS handles the lid irq and wakes the system from
> the FACS waking vector.
> > So it is likely that there is no lid open irq after the system is resumed.
> > BIOS may forget to update the cached lid value in the _WAK or some
> other control methods that could be executed after resuming.
> > Then if we send _LID result to the user space, the cached value could
> apparently be 'close'.
> >
> > That explains why there is no "lid open" configuration in the "Windows
> Device Manager".
> >
> >>
> >> I propose as a workaround to enable a kthread that will monitor the
> >> lid state and update the correct value to userspace (5 sec of polling
> >> time should be enough given that systemd checks every 20 sec).
> >> We should probably have this workaround only for a set of known
> >> devices, as it might just be temporary for those until the actual
> >> underlying problem is fixed (wrong DSDT in the Surface 3 case that
> >> doesn't notify at all, issue in the EC for the Surface Pro 1 and the
> >> Samsung N210).
> > [Lv Zheng]
> > That cannot help to solve the issue/gap.
> >
> > The problem is Linux userspace has a facility re-checking lid state when
> lid state is "close".
> > If it failed to update the lid state to "open" for a timeout period, it would
> suspend the platform.
> >
> > We actually are recommending Linus userspace to introduce a new lid
> handling mode to only handle "lid close" event and ignore "lid open" event.
> > During the period this is not changed from the userspace, we have to
> send "open" after resume/boot in order not to trigger the timeout
> mechanism.
> >
> I hear your points and I couldn't agree more that the only solution
> for us is (sadly) to mimic Windows.
> I am just disagreeing on the method used here: I'd rather have a
> fallback mechanism where the lid is not sent to "one" at each boot
> resume because this will mess us people using docking stations. The
> problem is that Red Hat has a lot of them, and I foresee a lot of
> complains to me if this is merged in its current state.
> We could have a parameter (say "send_lid_open_on_resume" or
> "use_bios_lid_value") in acpi/button.c which would allow people to
> choose if they want the "new" behavior or the old one. And we could
> also add some DMI matching for the messed up laptops/tablets where we
> force one or the other quirk. When we agree that user space now
> behaves gently with us, we could then remove entirely the quirk and
> the dmi matching.
> How does that sound?
[Lv Zheng]
The choices are in my first revision.
I could restore it back and re-send this series.
Also I need to update PATCH 02 to eliminate wrong IS_ERR_VALUE().

> Also, on the topic of the .notify not being sent by some BIOS, I have
> seen something similar on the surface 3.
> The ASL for the LID status change is the following:
> ---
> Method (_E4C, 0, Serialized) // _Exx: Edge-Triggered GPE
> {
> If ((HELD == One))
> {
> ^^LID.LIDB = One
> }
> Else
> {
> ^^LID.LIDB = Zero
> Notify (LID, 0x80) // Status Change
> }
> Notify (^^PCI0.SPI1.NTRG, One) // Device Check
> }
> ---
> I noticed I received the hotplug event on the touchscreen when the LID
> state changed, and tried to initiate the notify on the LID acpi device
> from within the touchscreen driver.
> This works well, except that from time to time, on resume, I don't
> even receive the hotplug notification. It is as if the notification is
> lost while I receive it 80% of the time.
[Lv Zheng]
Probably this is because of the same reason as mentioned before.
I still think Surface 3 is a different example.

On those traditional platforms, we can see that:
1. There are platforms returning hardware state directly from _LID
2. There are platforms returning cached state from _LID and update it via lid irq, EC._REG, _WAK
3. There are platforms returning cached state from _LID and update it via lid irq

The fact is there are simply so many type 3 platforms.
You might be thinking there are only several such kind of bogus devices.
But actually our team has been working on customizing those platforms, enhancing their EC._REG/_WAK for years in order to make the AML tables working on Linux.
This looks like an endless work to me...

> This might be completely unrelated, but I think it is worth mentioning
> because there might be a common root at the state not being updated
> correctly.
> Cheers, and sorry for being annoying :)
[Lv Zheng]
I didn't see anything annoying. :)


> Benjamin