Re: Bug: Elan v3 touchpad's reported size is half of actual touch coordinates

From: nyanpasu
Date: Fri Sep 02 2022 - 02:46:02 EST


This bug has not been resolved, and is reappearing on my system with
each kernel update, requiring that I rebuild and reinstall the kernel
module. Can a fix be implemented?

On Mon, Jul 18, 2022 at 2:34 PM nyanpasu <nyanpasu256@xxxxxxxxx> wrote:
>
> I noticed that on Linux 5.18.11, on my laptop (Dell Inspiron 15R SE
> 7520) with an Elan v3 touchpad (dmesg says "with firmware version
> 0x450f02"), the reported size of my touchpad (in userspace by calling
> mtdev_configure() and libevdev_get_abs_maximum(), in kernel space
> elantech_device_info::x_max/y_max, either way 1470 by 700) is half
> that of the actual touch range (2940 by 1400), and the upper half of
> my touchpad reports negative values. As a result, with the Synaptics
> X11 driver set to edge scrolling mode, the entire right half of my
> touchpad has x-values past evdev's reported maximum size, and acts as
> a giant scrollbar!
>
> The bug arises due to an error in drivers/input/mouse/elantech.c
> (which has been unchanged since 2021, in both torvalds and
> dtor/input).
>
> In this file, elantech_init_ps2() first calls
> elantech_query_info(struct psmouse *psmouse, struct
> elantech_device_info *info), which calls elantech_set_properties() to
> initialize info->hw_version = 3, then on version 3 touchpads requests
> ETP_FW_ID_QUERY to initialize info->x_max and y_max.
>
> Afterwards, elantech_init_ps2() calls elantech_setup_ps2(struct
> psmouse *psmouse, struct elantech_device_info *info), which calls
> elantech_set_absolute_mode(struct psmouse *psmouse), which if
> (etd->info.set_hw_resolution) (true on my machine), writes 0x0b to
> etd->reg_10 and sends it to the hardware.
>
> The problem is that elantech_setup_ps2() ->
> elantech_set_absolute_mode() sets up absolute mode and doubles the
> hardware resolution (doubling the hardware's maximum reported x/y
> coordinates and its response to ETP_FW_ID_QUERY), *after*
> elantech_query_info() fetches the touchpad coordinate system size
> using ETP_FW_ID_QUERY, which gets cached and reported to userspace
> through ioctl(fd, EVIOCGABS(ABS_X/Y), ...). So the touchpad size
> reported to userspace (and used to subtract vertical coordinates from)
> is half the maximum position of actual touches.
>
> I've considered and implemented multiple possible fixes to this issue.
>
> The easiest solution (or hack) is for elantech_set_absolute_mode() to
> first write reg_10 to hardware, then if (etd->info.set_hw_resolution)
> is true, call elantech_query_info() a second time. However this
> re-requests device firmware version etc. and rewrites state, which I
> dislike because it could theoretically change fields in
> elantech_device_info you didn't mean to.
>
> You can't enable absolute mode first before calling
> elantech_query_info(), because elantech_query_info() is responsible
> for probing hardware and identifying if it's v3, and
> elantech_set_absolute_mode() depends on it.
>
> You *could* make elantech_query_info() only query the hardware version
> before returning, then enable absolute mode before calling a new
> function to identify the touchpad size. I did not take this approach
> because I didn't know how much code to move from elantech_query_info()
> into a new function called after elantech_set_absolute_mode()
> (everything up to ETP_ICBODY_QUERY probably not, ETP_FW_ID_QUERY
> definitely should be, and probably elantech_get_resolution_v4's
> ETP_RESOLUTION_QUERY for consistency). I think it's safe to move only
> "query range information" and possibly "check for the middle button:
> DMI matching or new v4 firmwares", and elantech_get_resolution_v4()
> (for consistency), to after elantech_set_absolute_mode(). However,
> this change affects the order of hardware commands on all hardware
> (not just the v3 Elan I have), and I cannot verify it works on other
> Elan touchpads.
>
> The most surgical fix I could think of (minimizing changes to behavior
> from the current Linux driver), is fetching *only* ETP_FW_ID_QUERY
> (touchpad size) a second time after elantech_set_absolute_mode()
> enables absolute mode and double-size mode. I have a local clone of
> Linux, which I edited to split out the ETP_FW_ID_QUERY request into a
> helper function elantech_query_range_v3(), which gets called by
> elantech_query_info() before boosting resolution, and by
> elantech_set_absolute_mode() after/if it boosts resolution. I built
> drivers/input/mouse and installed the resulting psmouse.ko, which
> fixes the bug (the touchpad size reported to userspace is 2940 by
> 1400, matching the range of touch events) and fixes edge scrolling.
> Should I submit this patch (alongside patches removing
> elantech_data::y_max/width which are redundant with
> elantech_data::info.y_max/width) or not?
>
> However I do not know how the Windows Dell/Elan v3 driver (or other
> hardware versions' drivers) initialize the touchpad and driver,
> specifically what order it sends various queries. I only have an Elan
> v3 touchpad (no other Elan hardware/driver pairs, just Alps and
> Synaptics) to study, do not have a logic analyzer to read the commands
> off the wire (though perhaps I could rig one up fast enough for PS/2
> using a Pi's GPIO pins), and have not tried decompiling that driver in
> Ghidra (which may not be clean-room safe, but I don't know if the
> current driver is clean-room or not, since elantech.rst mentions
> "name[s] given in the Dell-provided driver"). Is it critical to match
> the Windows driver's command order, or can we use any order that
> works?
>
> (Out of curiosity, given that PS/2 runs at <20 KHz and the Elan driver
> is written with blocking calls, is mouse processing code run in a
> dedicated kernel thread to avoid blocking other kernel tasks? I'm not
> an experienced kernel developer but a drive-by contributor, so I don't
> quite know how kernel concurrency works.)
>
> (Note I'm not subscribed to linux-input or linux-kernel, please CC me
> in replies.)