Re: [PATCH] ELAN touchpad i2c_hid bugs fix
From: Dmitry Torokhov
Date: Fri Mar 29 2019 - 14:24:05 EST
On Fri, Mar 29, 2019 at 5:18 AM Hans de Goede <hdegoede@xxxxxxxxxx> wrote:
>
> Hi,
>
> On 3/25/19 5:56 PM, Dmitry Torokhov wrote:
> > Hi Hans,
> >
> > On Mon, Mar 25, 2019 at 9:38 AM Hans de Goede <hdegoede@xxxxxxxxxx> wrote:
> >>
> >> Hi Dmitry,
> >>
> >> On 25-03-19 17:02, Dmitry Torokhov wrote:
> >>> Hi Vladislav,
> >>>
> >>> On Mon, Mar 25, 2019 at 5:57 AM Vladislav Dalechyn
> >>> <vlad.dalechin@xxxxxxxxx> wrote:
> >>>>
> >>>> From: Vladislav Dalechyn <hotwater438@xxxxxxxxxxxx>
> >>>>
> >>>> Description: The ELAN1200:04F3:303E touchpad exposes several issues, all
> >>>> caused by an error setting the correct IRQ_TRIGGER flag:
> >>>> - i2c_hid incoplete error flood in journalctl;
> >>>> - Five finger tap kill's module so you have to restart it;
> >>>> - Two finger scoll is working incorrect and sometimes even when you
> >>>> raised one of two finger still thinks that you are scrolling.
> >>>>
> >>>> Fix all of these with a new quirk that corrects the trigger flag
> >>>> announced by the ACPI tables. (edge-falling).
> >>>
> >>> I do not believe this is right solution. The driver makes liberal use
> >>> of disable_irq() and enable_irq() which may lead to lost edges and
> >>> touchpad stopping working altogether.
> >>>
> >>> Usually the "extra" report is caused by GPIO controller clearing
> >>> interrupt condition at the wrong time (too early), or in unsafe or
> >>> racy fashion. You need to look there instead of adding quirk to
> >>> i2c-hid.
> >>
> >> The falling-edge solution was proposed by Elan themselves.
> >>
> >> Also if you look at: https://bugzilla.redhat.com/show_bug.cgi?id=1543769
> >>
> >> And esp. the "cat /proc/interrupts" output there, then you will see
> >> that the interrupt seems to be stuck at low level, which according
> >> to the ACPI tables is its active level.
> >
> > So how does it generate a new edge if it is stuck at low?
> >
> > Is it bad touchpad firmware that does not deassert interrupt quickly
> > enough?
>
> I do not believe it is not de-asserting it quick enough (I believe
> the amount of interrups is too high for that.
>
> It seems to simply be low most of the time, or it is really really
> slow with de-asserting.
>
> Vladislav can you check the output of /cat/interrupts on a kernel
> without the patch and while *not* using the touchpad; and check
> if the amount of touchpads-interrupts still keeps increasing in this
> case?
>
> Also I believe that you had contact with Elan about this and they
> told you to change the interrupt type to falling-edge as work-around,
> right? Can you ask them why?
KT, do you know anything about using edge interrupts with your
hid-over-i2c products?
>
> This is quite unususal, I've collected quite a few DSDTs over time
> and I've just checked about 40 of them all with a PNP0C50 in
> some form (and in many cases multiple such devices) and NONE of
> them is using edged-interrupts in the ACPI config.
That is because MS spec for HID over I2C requires level interrupts:
"7.4 Interrupt Line Management
DEVICE is responsible to assert the interrupt line (level trigger
interrupt required) when it has data.
DEVICE must keep the interrupt line asserted until the data has been
fully read by the HOST. After this point, the DEVICE must de-assert
the interrupt line if it has no more data to report to the HOST.
When the DEVICE has new data to report, it must repeat the process above."
See https://docs.microsoft.com/en-us/previous-versions/windows/hardware/design/dn642101(v=vs.85)
>
> <speculation>
>
> I think that the Elan touchpad firmware supports a mode to
> work on devices which only support edge interrupts and that
> this mode is accidentally enabled in this firmware.
>
> I think that the interrupt line is simply low all the time
> and gets pulsed high then low again when the touchpad detects
> a finger. Hopefully it does this pulsing on every event and not
> only when its event "fifo" is empty.
This behavior would violate HID-over-I2C spec though.
>
> </speculation>
>
> > I scrolled through the bug but I do not see if it had been
> > confirmed that original windows installation actually uses edge (it
> > may very well be using it; Elan engineers pushed us to use edge in a
> > few cases, but they all boiled down to an issue with pin control/GPIO
> > implementation).
>
> This has not been checked on Windows AFAIK.
>
> >> As for this being a GPIO chip driver problem, this is using standard
> >> Intel pinctrl stuff, which is not showing this same issue with many
> >> other i2c-hid touchpads.
> >
> > Well, there have been plenty of issues in intel drivers, coupled with
> > "interesting" things done by firmware and boards.
> >
> > If you want to keep on using edge you need to make sure that i2c-hid
> > never loses edge, as replaying of previously disabled interrupts in
> > not at all reliable. So you need to "kick" the device after
> > enable_irq() by initiating read from it and be ready to not get any
> > data or get valid data and process accordingly.
>
> That is a good point and I agree.
>
> Vladislav, let me explain this a bit. Normally the touchpad
> driver the IRQ line low when it has touch-data to respond, which means
> that if touch-data is reported before the driver loads (or while
> the driver has the irq disabled during e.g. suspend), it will immediately
> see an interrupt. If we use edge mode then the IRQ will only trigger
> when the IRQ line goes from high to low, if this happens when the driver
> is not listening then we do not see the edge. And since we never read the
> pending touch-data, the IRQ line never goes high again (which it does
> when we have read all available data), so we will never see a negative-edge
> and then things are stuck.
>
> It would be good, if running a kernel with your patch, you can try
> to trigger this by:
>
> 1) Suspending the machine by selecting suspend from a menu in your
> desktop environment, or by briefly pressing the power-button, do
> not close the lid
> 2) As soon as the system starts suspending and while it is suspended, move
> your finger around the touchpad
> 3) Wake the system up with the powerbutton while moving your finger around
> 4) Check if the touchpad still works after this
>
> Or by:
>
> 1) Using ctrl + alt + f3 to switch to a text console
> 2) Move finger around on touchpad, keep moving it around
> 3) Switch back to X11 with alt + f2 or alt + f7, while still moving the finger
> 4) Check if the touchpad still works after this
>
> If neither causes the touchpad to stop working, then at least the problem Dmitry
> fears is not easy to trigger, but we should probably still prepare to deal
> with it; and we really should try to better understand the problem here, so
> if you can answer my questions above, then that would be great.
Thanks.
--
Dmitry