Re: [PATCH 5.4 regression fix] Input: soc_button_array - partial revert of support for newer surface devices

From: Yauhen Kharuzhy
Date: Sat Oct 05 2019 - 08:24:59 EST


On Sat, Oct 05, 2019 at 12:55:51PM +0200, Hans de Goede wrote:
> Commit c394159310d0 ("Input: soc_button_array - add support for newer
> surface devices") not only added support for the MSHW0040 ACPI HID,
> but for some reason it also makes changes to the error handling of the
> soc_button_lookup_gpio() call in soc_button_device_create(). Note ideally
> this seamingly unrelated change would have been made in a separate commit,
> with a message explaining the what and why of this change.
>
> I guess this change may have been added to deal with -EPROBE_DEFER errors,
> but in case of the existing support for PNP0C40 devices, treating
> -EPROBE_DEFER as any other error is deliberate, see the comment this
> commit adds for why.
>
> The actual returning of -EPROBE_DEFER to the caller of soc_button_probe()
> introduced by the new error checking causes a serious regression:
>
> On devices with so called virtual GPIOs soc_button_lookup_gpio() will
> always return -EPROBE_DEFER for these fake GPIOs, when this happens
> during the second call of soc_button_device_create() we already have
> successfully registered our first child. This causes the kernel to think
> we are making progress with probing things even though we unregister the
> child before again before we return the -EPROBE_DEFER. Since we are making
> progress the kernel will retry deferred-probes again immediately ending
> up stuck in a loop with the following showing in dmesg:
>
> [ 124.022697] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6537
> [ 124.040764] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6538
> [ 124.056967] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6539
> [ 124.072143] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6540
> [ 124.092373] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6541
> [ 124.108065] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6542
> [ 124.128483] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6543
> [ 124.147141] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6544
> [ 124.165070] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6545
> [ 124.179775] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6546
> [ 124.202726] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6547
> <continues on and on and on>
>
> And 1 CPU core being stuck at 100% and udev hanging since it is waiting
> for the modprobe of soc_button_array to return.
>
> This patch reverts the soc_button_lookup_gpio() error handling changes,
> fixing this regression.

I confirm this issue and I made similar hotfix while developing of other
things, so it's actual and patch should work. I will test it at my hardware
soon.

--
Yauhen Kharuzhy