Re: [regression] Since kernel 6.3.1 logitech unify receiver not working properly

From: Jiri Kosina
Date: Tue May 23 2023 - 08:31:25 EST


On Mon, 22 May 2023, Linus Torvalds wrote:

> > FWIW, in case anybody is interested in a status update: one reporter
> > bisected the problem down to 586e8fede79 ("HID: logitech-hidpp: Retry
> > commands when device is busy"); reverting that commit on-top of 6.3
> > fixes the problem for that reporter. For that reporter things also work
> > on 6.4-rc; but for someone else that is affected that's not the case.

FWIW, I was pretty much away for past few weeks as well, same as Benjamin
as Bastien. Which is unfortunate timing, but that's how things pan out
sometimes.

> Hmm. It's likely timing-dependent.
>
> But that code is clearly buggy.
>
> If the wait_event_timeout() returns early, the device hasn't replied,
> but the code does
>
> if (!wait_event_timeout(hidpp->wait, hidpp->answer_available,
> 5*HZ)) {
> dbg_hid("%s:timeout waiting for response\n", __func__);
> memset(response, 0, sizeof(struct hidpp_report));
> ret = -ETIMEDOUT;
> }
>
> and then continues to look at the response _anyway_.

Yeah; we are zeroing it out, but that doesn't really make things any
better in principle, given all the dereferences later.

The issue seems to be existing ever since 2f31c52529 ("HID: Introduce
hidpp, a module to handle Logitech hid++ devices") when this whole driver
was introduced, as far as I can tell.

> Now, depending on out hardening options, that response may have been
> initialized by the compiler, or may just be random stack contents.

Again, as in case of timeout the buffer is just zeroed out, I'd just much
more expect NULL pointer dereference in such case. Which is not what we
are seeing here.

> That bug is pre-existing (ie the problem was not introduced by that
> commit), but who knows if the retry makes things worse (ie if it then
> triggers on a retry, the response data will be the *previous* response).
>
> The whole "goto exit" games should be removed too, because we're in a
> for-loop, and instead of "goto exit" it should just do "break".
>
> IOW, something like this might be worth testing.
>
> That said, while I think the code is buggy, I doubt this is the actual
> cause of the problem people are reporting. But it would be lovely to
> hear if the attached patch makes any difference, and I think this is
> fixing a real - but unlikely - problem anyway.
>
> And obviously it might be helpful to actually enable those dbg_hid()
> messages, but I didn't look at what the magic config option to do so
> was.

dbg_hid() is just pr_debug(), which means that on kernels with
CONFIG_DYNAMIC_DEBUG, this makes use of the dynamic debug facility;
otherwsie it just becomes printk(KERN_DEBUG...).

Thanks,

--
Jiri Kosina
SUSE Labs