RE: [PATCH 1/2] Revert "ACPI / button: Remove lid_init_state=method mode"

From: Zheng, Lv
Date: Tue May 16 2017 - 01:05:32 EST


Hi, Benjamin

I reordered the discussion to collect topics and delete things to make discussion shorter.

1. root caused issue:

> > It seems we just need to determine the following first:
> > 1. Who should be responsible for solving bugs triggered by the conflict between bios and linux user
> space expectations:
> > button driver? libinput? Some other user space programs? Users?
> Hopefully libinput or systemd (through a udev rule). If things gets
> worse a acpi/button quirk might be used, but in a second time.

I have concerns about what's in your mind. :)

So let me high light a root caused issue:
https://bugzilla.kernel.org/show_bug.cgi?id=106151
If we use any "open" modes, the suspend/resume loop can be fixed.
Both "ignore/method" modes cannot fix the problem.
In this bug, lid open event has a huge delay. But it can correctly arrive.
However systemd will force 2nd suspend if it cannot see "open" event instantly after resume.
So why don't systemd fix the issue (the enforcement) prior than letting us (input layer/button driver) to invent workarounds?
IMO, this is a root caused problem and should be the first priority.
Before seeing it is addressed in systemd, any changes made to libinput/button driver may not be proper.

So the order of fixing all troubles are the followings in my mind:
1. system -> should eliminate the enforcement first
2. libinput -> may change lid event type (see my reply below for topic 2)
3. button driver

> > > > > The issue we are fixing here is the fact that the switch state is wrong,
> > > > > which makes user space assumptions wrong too (and users angry).
> > > > Considering no platform quirks.
> > > > If ACPI button driver sends SW_LID, users are likely angry.
> > > > Unless the user space programs are changed to "ignore open event".
> > > > If ACPI button driver doesn't send switch events, but key events.
> > > > The user space programs need to change to handle the new events.
> > > > So finally whatever we do, user space need to change a bit related to ACPI control method lid
> device.
> > > Or we fix the switch to report accurate events/state.
> > > You do realise that all the energy you are spending, answering to me,
> > > talking to user space maintainers, users, all comes down to the fact
> > > that you refuse to have hardware quirks?

Yes, as we have a root caused but unfixed issue in systemd first.
It's pointless to introduce hardware quirks at this point.

> > Thus there is no possible __FIX__ for acpi button driver and libinput.
> I never talked about a fix. I know the situation is unsolvable, which is
> why I talked about quirks or workarounds.
> > While user space programs can just fix their usage models.
> You can't expect user space to change anything from the kernel point of
> view without a long enough warning.

Why cannot we expect so?
The above issue has already been root caused.

> > > > > > However, is that possible to not introduce platform quirks?
> > > > > > For example, for systemd:
> > > > > > If it detected ACPI lid device, automatically switch to an "IgnoreLidOpenEvent" mode (like
> > > nouveau
> > > > > drivers' ignorelid=Y option).
> > > > > Well, given that 99.9% of laptops have this ACPI lid button, you'll just
> > > > > remove the feature from all laptops.
> > > > No, I only removed the wrong usage models related to the strict "open" event.
> > > > All laptops are still capable of sending correct "close" event.
> > > My bad, I read too fast and missed the "...Open..." part of
> > > "IgnoreLidOpenEvent".
> > > Though I am not sure IgnoreLidOpenEvent is accurate.
> > > "OpenEventNeverSent" seems to reflect more the reality. But again,
> > > that's not of our (kernel developers) business.
> > IMO this is the only root cause fix. :)
> > It's the only way that the user can use without changing its quirk modes for different usage models.
> Yes and no. There is a design issue in logind, but this is based on the
> design choice made in acpi/button: we use an input switch. A switch has
> a state, and the kernel ought to report correct state. With this in
> mind, the design choice in logind is correct. But now we are
> seeing that some OEM are not providing everything we need, and we need
> to find the solution.

We can stop arguing this.
First of all, we just need to check if systemd can remove the enforcement mentioned above.
And what will happen next.
It's likely that there will be no user issues or button driver design issues then.
Thus finally we may needn't introduce any "hardware quirks".

2. keep "method" mode:

> > Button driver default behavior should be (not 100% sure if this is your opinion):
> button.lid_init_state=method
>
> Yes, I'd like to revert to the old behavior (see below for a rationale).
...
> > 2. What should be the default button driver behavior?
> > button.lid_init_state=ignore? button.lid_init_state=method?
> button.lid_init_state=method:
> - this was the old default state, and switching to something else
> creates regression
> - the lid switch input node is an EV_SWITCH. And this has a state we
> need to synchronise with the hardware first. The default EV_SW state
> is closed, so if we do not sync it first, we might report wrong state
> to user space.
...
> > If we can use libinput to manage platform quirks, then button.lid_init_state=method also looks
> useless.
> No. 'method' is the only way to guarantee the exported input node is
> synced properly with the actual physical world.
> > The question is: Should button.lid_init_state=method be kept?
> Definitively.

Don't you think we actually have a new feature here?
See, many years ago, all power buttons are on keyboard.
Now all power buttons are on the edge, and many laptops have docking station.
Don't you think we need to re-define the input event type of lid switch?
Or we need to drive users of SW_LID/ACPI lid event to be not so strict to its switch event nature.

So if SW_LID itself or its users are changed in this direction, will that really be useful to keep "method"?

> Switching back to button.lid_init_state=method will prevent regressions
> to be filled. If a user buys a new machine and the LID switch doesn't
> work as expected, it's not a regression, it's a bug. Regression is
> something we absolutely need to avoid, and switching to
> button.lid_init_state=open introduced far too many regressions on
> systems.

IMO, whether "method" can be the default behavior depending on 2 factors:
1. We are discussing on freedesktop to see if things can be improved in graphics world;
2. topic 1 issue can be fixed.
If graphics world is not changed and issue 1 is fixed, we can use "method" mode.
If graphics world is changed and issue 1 is fixed, we can use "ignore" mode.
If graphics world is changed and issue 1 is not fixed, we should stay using "open" mode.
So it might be pointless to determine the default behavior such early.
Given users can work things around using boot parameters.
Aggressively making such a change now just make bug fixing work going back and forth.

3. keep "open/close" mode

> We should keep button.lid_init_state=open:
> - it's kernel API now, so you can't remove it without breaking users
> configuration
> - it helps booting laptops that are not known to be working, the time
> the user installs the distribution and add the hwdb entry to fix it.
> > If user space programs have special needs, they can fix them on their own, via the following mean:
> > libinput:name:*Lid Switch*:dmi:*svnMicrosoftCorporation:pnSurface3:*
> > LIBINPUT_ATTR_LID_SWITCH_RELIABILITY=write_open
> > From this point of view, we actually don't need open/close modes.
> We don't need close, but we need to keep open, for 2 reasons I'll detail
> below.
> > 3. If non button driver quirks are working, button driver quirk modes are useless.
> > The question is: Should button.lid_init_state=open/close be kept?
> we should not keep button.lid_init_state=close:
> - I don't see a good use case where it is needed, besides debugging
> drivers that should be debugged in an other way.

If we presuppose that SW_LID cannot change, we surely need quirk modes for some new usage model issues.
IMO,
If quirks are done in libinput, then we needn't "open/close".
If button driver need to provide quirks on its own, then we need both "open/close" to be able to deal with all cases.
I already listed a use case that "open/method" cannot deal with:
No lid notification and _LID return a cached "open" value after boot/resume.
External monitor is connected and lid is closed.
But we haven't reached an agreement yet.
So please answer the following question.
Could you tell me what the quirk mode (button driver mode or libinput node writes, whatever) should we use for the following cases:
1. No lid notification and _LID return value is a cached open after boot/resume
1.1. External monitor is connected, lid is closed
1.2. External monitor is connected, lid is opened
2. No lid notification and _LID return value is a cached close after boot/resume
2.1. External monitor is connected, lid is closed
2.2. External monitor is connected, lid is opened

4. quirk tables

> > > If you don't want to write/maintain the code, fine. I can do it, but
> > > please stop trying to make everyone else change because you just don't
> > > want the burden of quirking a handful of laptops.
> > See the base line from my side is very clear:
> > If acpi community need to handle such bug reports, button.lid_init_state=method cannot be the
> default behavior.
> I already proposed to handle those reports. I don't see why you are
> concerned about those future reports.
> > So finally we actually cannot fix anything by maintaining the quirk table, but just create a
> business of maintaining the quirk table.
> Yes, but I never asked you to maintain such table. I am volunteering to
> do that if needed.
> Would you agree on:
> - acking both my patches so Rafael can take them ans stable can apply
> them ASAP
> - give me maintainership of drivers/acpi/button.c (ACK a following patch
> where I add me to MAINTAINERS)
> - redirect all kernel bugs to me related to LID switch (or having me the
> default assignee for a new acpi/button component)
> - allow me to add a list of quirk in acpi/button.c if I judge this as
> fit

As reasons mentioned in topic 1 and 2.
For now, I'd rather to see progresses made in systemd/libinput first.

Actually there is no special maintainership related to the button driver.
All drivers of ACPI spec defined devices (PNPxxx stuffs) are maintained as a whole by Rafael.
And we are helping him by doing triage on kernel Bugzilla.

There isn't a category on kernel Bugzilla related to lid issues.
Probably you can help to create one under ACPI product.
If this can be achieved, you can be the default assignee for such issues.
If this cannot be achieved, we'll just forward some lid reports to you first.
Please tell me which product/component you'd prefer for us to forward.

For modifying acpi/button.c, it's open source, you can send patches and ACPI community can help to review.

Thanks and best regards
Lv