Re: [regression] Since kernel 6.3.1 logitech unify receiver not working properly

From: Benjamin Tissoires
Date: Wed May 24 2023 - 11:28:41 EST

On Tue, May 23, 2023 at 2:31 PM Jiri Kosina <jikos@xxxxxxxxxx> wrote:
> On Mon, 22 May 2023, Linus Torvalds wrote:
> > > FWIW, in case anybody is interested in a status update: one reporter
> > > bisected the problem down to 586e8fede79 ("HID: logitech-hidpp: Retry
> > > commands when device is busy"); reverting that commit on-top of 6.3
> > > fixes the problem for that reporter. For that reporter things also work
> > > on 6.4-rc; but for someone else that is affected that's not the case.
> FWIW, I was pretty much away for past few weeks as well, same as Benjamin
> as Bastien. Which is unfortunate timing, but that's how things pan out
> sometimes.
> > Hmm. It's likely timing-dependent.
> >
> > But that code is clearly buggy.
> >
> > If the wait_event_timeout() returns early, the device hasn't replied,
> > but the code does
> >
> > if (!wait_event_timeout(hidpp->wait, hidpp->answer_available,
> > 5*HZ)) {
> > dbg_hid("%s:timeout waiting for response\n", __func__);
> > memset(response, 0, sizeof(struct hidpp_report));
> > ret = -ETIMEDOUT;
> > }
> >
> > and then continues to look at the response _anyway_.
> Yeah; we are zeroing it out, but that doesn't really make things any
> better in principle, given all the dereferences later.
> The issue seems to be existing ever since 2f31c52529 ("HID: Introduce
> hidpp, a module to handle Logitech hid++ devices") when this whole driver
> was introduced, as far as I can tell.

Yep, that was on me. But the weird part is that I should be able to
reproduce this locally then, but I don't.

> > Now, depending on out hardening options, that response may have been
> > initialized by the compiler, or may just be random stack contents.
> Again, as in case of timeout the buffer is just zeroed out, I'd just much
> more expect NULL pointer dereference in such case. Which is not what we
> are seeing here.

Returning -ETIMEDOUT seems good to me FWIW.

> > That bug is pre-existing (ie the problem was not introduced by that
> > commit), but who knows if the retry makes things worse (ie if it then
> > triggers on a retry, the response data will be the *previous* response).
> >
> > The whole "goto exit" games should be removed too, because we're in a
> > for-loop, and instead of "goto exit" it should just do "break".
> >
> > IOW, something like this might be worth testing.
> >
> > That said, while I think the code is buggy, I doubt this is the actual
> > cause of the problem people are reporting. But it would be lovely to
> > hear if the attached patch makes any difference, and I think this is
> > fixing a real - but unlikely - problem anyway.

FWIW, Linus, your patch is
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@xxxxxxxxxx>

Feel free to submit it to us or to apply it directly if you prefer as
this is clearly a fix for a code path issue.

I am barely struggling with everything now that I am back from last
week, being sick at the beginning of the week and still not feeling
completely well doesn't help.


> >
> > And obviously it might be helpful to actually enable those dbg_hid()
> > messages, but I didn't look at what the magic config option to do so
> > was.
> dbg_hid() is just pr_debug(), which means that on kernels with
> CONFIG_DYNAMIC_DEBUG, this makes use of the dynamic debug facility;
> otherwsie it just becomes printk(KERN_DEBUG...).
> Thanks,
> --
> Jiri Kosina
> SUSE Labs