RE: [PATCH 2/2] Revert "ACPI / button: Change default behavior to lid_init_state=open"

From: Zheng, Lv
Date: Tue May 16 2017 - 04:30:34 EST


Hi, Benjamin

> > > > > >> >> > > > > For example, such a hwdb entry is:
> > > > > >> >> > > > > libinput:name:*Lid Switch*:dmi:*svnMicrosoftCorporation:pnSurface3:*
> > > > > >> >> > > > > LIBINPUT_ATTR_LID_SWITCH_RELIABILITY=write_open
> > > > > >> >> Well, if it worked in a specific way that users depended on before the commit in
> > > > > >> >> question and now it works differently, then it does break things.
> > > > > >> >> Benjamin, my understanding is that this is the case, is it correct?
> > > > > >> > That is correct. This patch I reverted introduces regression for professional
> > > > > >> > laptops that expect the LID switch to be reported accurately.
> > > > > >> And from a user's perspective, what does not work any more?
> > > > > > If you boot or resume your laptop with the lid closed on a docking
> > > > > > station while using an external monitor connected to it, both internal
> > > > > > and external displays will light on, while only the external should.
> > > > > > There is a design choice in gdm to only provide the greater on the
> > > > > > internal display when lit on, so users only see a gray area on the
> > > > > > external monitor. Also, the cursor will not show up as it's by default
> > > > > > on the internal display too.
> > > > > > To "fix" that, users have to open the laptop once and close it once
> > > > > > again to sync the state of the switch with the hardware state.
> > > > > OK
> > > > > Yeah, that sucks.
> > > > > So without the Lv's patch the behavior (on the systems in question) is
> > > > > as expected, right?
> > > > Yes, reverting these 2 patches restores the pre v4.11 kernel behavior.
> > > I would make an argument that:
> > > A. Is this necessarily a button driver regression?
> > > 1. Users already configured to not using internal display, why gdm need to determine it again
> instead
> > > of users choice?
> > > 2. Can gdm/graphics driver saves state before suspend, and restores saved state after resume?
> > > If users didn't change state during suspend, then everything should be correct.
> > > If users changed state during suspend, it should be acceptable for users to change it again to
> make
> > > the state correct.
> > > See, this is obviously a case that is not so strictly related to ACPI button driver.
> > > Why do we need to force button driver to marry external monitors.
> > > B. Bug reporters are all ok with using quirk modes as boot parameters to work this around.
> > > Why should we change our default behavior aimlessly?
> >
> > I have one more concern:
> > In button.lid_init_state=method mode,
> > Is that possible for libinput to work things around if _LID return value is not correct?
> > How libinput ensures correct timing of overwriting the input node value?
> > Will button driver faked event value overwrites what libinput has written?
> >
> > From this point of view, button.lid_init_state=ignore might be a better choice than
> button.lid_init_state=method to allow libinput to deal with all kind of cases.
> >
>
> This is my last email on this topic, I don't even want to fully read/answer
> the one in 1/2 given the amount of bad faith you put in that.

What's that?
I mean, the bad faith?

> This is a REGRESSION. It used to work on thousands of devices, it
> doesn't anymore. So any regression has to be chased down and no good
> reason can justify such a regression.

I triggered many such kind of layered regressions and did fix them 1 by 1 in different places.
However, this might be different.
Which depends on our agreement.

> The only solution is to revert both these changes. We can not ask user
> space to fix a kernel regression, it's not how it works.

Yes, I know.
We just asked users to use quirk modes of button driver.
And there is in fact always one of them working.
We haven't asked user space to change.
We are just discussing the correctness of some user space behaviors.

> You can not also change the semantic of an input switch. An input
> switch, as per the input subsystem is supposed to forward an actual
> state of the underlying hardware. Any fake information is bad and has to
> be avoided.

Since fake events are harmful, why do we fake an event after boot/resume?
button.lid_init_state=method seems can fake such an event.

> I already gave you 2 solutions to fix the 7 machines you see that are
> problematic, and you just seem to ignore them:
> - revert to the v4.10 behavior and let libinput fix that for you

I already chose this.
But I just raised a concern that button.lid_init_state=method could bring troubles to libinput quirks.

> - revert to the v4.10 behavior and have a quirk database in acpi/button
>
> I also proposed to take maintainership on this particular module because
> you said you were assigned this by default because you were the last
> introducing changes in it. I asked you twice, and two times you replied
> skipping this part.

I didn't, you just skipped PATCH 1/2 reply...
Let me copy it from that email:
=====
Actually there is no special maintainership related to the button driver.
All drivers of ACPI spec defined devices (PNPxxx stuffs) are maintained as a whole by Rafael.
And we are helping him by doing triage on kernel Bugzilla.

There isn't a category on kernel Bugzilla related to lid issues.
Probably you can help to create one under ACPI product.
If this can be achieved, you can be the default assignee for such issues.
If this cannot be achieved, we'll just forward some lid reports to you first.
Please tell me which product/component you'd prefer for us to forward.

For modifying acpi/button.c, it's open source, you can send patches and ACPI community can help to review.
=====

> It's clear you don't want to revert to the old state, and even if I can
> prove to you that you have to, you don't listen.
> So please, do not force me to call the maintainers and Linus on this
> simple 2 reverts.

No, I currently cannot persuade myself to revert to "method" mode.
But that doesn't mean I don't want to or I want to.
You just didn't prove that by answering my questions in PATCH 1/2 and in this email.
Especially:
1. After reverting to "method" mode, can libinput work all cases around?
Are there any timing issues that can prevent libinput from working some sucks around.
Considering this case, it's likely such problems can happen.
https://bugzilla.kernel.org/show_bug.cgi?id=106151
2. Who should be responsible for fixing this issue after reverting to "method" mode:
https://bugzilla.kernel.org/show_bug.cgi?id=106151
What's the root cause of this issue?
3. How can we generate the quirk for the following possible breakage:
No lid notification and _LID return cached open after boot/resume

Cheers,
Lv