Re: [PATCH v10 1/4] HID: hid-msi: Add MSI Claw configuration driver

From: Derek John Clark

Date: Tue Jun 02 2026 - 22:33:39 EST


On Fri, May 29, 2026 at 11:29 AM Dmitry Torokhov
<dmitry.torokhov@xxxxxxxxx> wrote:
>
> On Thu, May 28, 2026 at 11:34:18PM -0700, Derek John Clark wrote:
> > On Wed, May 27, 2026 at 10:32 PM Dmitry Torokhov
> > <dmitry.torokhov@xxxxxxxxx> wrote:
> > >
> > > Hi Derek,
> > >
> > > On Wed, May 27, 2026 at 10:21:19PM +0000, Derek J. Clark wrote:
> > > > Adds configuration HID driver for the MSI Claw series of handheld PC's.
> > > > In this initial patch add the initial driver outline and attributes for
> > > > changing the gamepad mode, M-key behavior, and add a WO reset function.
> > > >
> > > > Sending the SWITCH_MODE and RESET commands causes a USB disconnect in
> > > > the device. The completion will therefore never get hit and would trigger
> > > > an -EIO. To avoid showing the user an error for every write to these
> > > > attrs a bypass for the completion handling is introduced when timeout ==
> > > > 0.
> > > >
> > > > The initial version of this patch was written by Denis Benato, which
> > > > contained the initial reverse-engineering and implementation for the
> > > > gamepad mode switching. This work was later expanded by Zhouwang Huang
> > > > to include more gamepad modes. Finally, I refactored the drivers data
> > > > in/out flow and overall format to conform to kernel driver best
> > > > practices and style guides. Claude was used as an initial reviewer of
> > > > this patch.
> > >
> > > I wonder why do you need to roll asynchronous probing and asynchronous
> > > resume by hand? This I think complicates the driver greatly and forces
> > > you to use a ton of works, spinlocks, and checks.
> > >
> > > Thanks.
> > >
> >
> > Hi Dmitry,
> >
> > I suppose being asked this means my cover letter and commit
> > descriptions need some additional context. The MCU in these Claw
> > devices is quite temperamental. There are a few specific issues that
> > cause the need for multiple work queues, a serialization mutex, and
> > subsequently spinlocks to prevent stale data reads.
> >
> > 1.) The MCU will halt function if it receives any output reports
> > before ~500MS after probe or resume. This can either manifest as the
> > device never responding to a command, or it can cause the entire
> > system to become unstable and reboot. This creates the need for
> > cfg_setup to query the MCU and then add the gamepad attrs, led_mc
> > device, and rgb attrs. As a side effect, because a system could
> > technically be suspended during that 500ms delay, there exists the
> > need to re-queue the work if it was never triggered, hence the resume
> > queue.
> > 2.) The MCU will not always respond in order if two or more output
> > reports are sent within a few ms of each other. Since many of the
> > commands use a generic "ACK", or share an "ACK" type but don't provide
> > specific context about what sub-function called them, we could
> > potentially have cross talk where data is saved in the wrong attribute
> > or errors propagate because of a missed message. To get around this
> > serialization issue we hold a mutex through a completion triggered in
> > raw_event and, for most events, save a state machine on what command
> > is expected and what sub-command was the initiator. (I.E. profile
> > events handle the M1, M2, RGB, Left rumble, and right rumble). Since
> > the state machine is accessed on both sides, we need spinlocks
> > guarding the reads. This essentially serializes the data and makes it
> > predictable. Using this pattern I haven't had any issues reading from
> > or writing to the MCU.
> > 3.) Some commands will never return their "ACK" while a completion is
> > held, so we have a workaround to basically ignore them and hope the
> > command worked. This is only needed for SYNC_TO_ROM, from which we
> > don't need to set anything on its "ACK", and switching the gamepad
> > mode, which causes USB disconnect/reconnect and the driver fully
> > reloads, so we'll never be able to read it anyway.
> > 4.) The RGB work queue is used to free the userspace write while the
> > completion is held. I found that use without it could stall userspace
> > quite significantly if it has multiple writes back to back. I
> > experienced this using Steam's customization menu, which sends a
> > single write for every increment of its color and brightness sliders.
> > when traversing the full length of the slider it is possible to have
> > effects changing for nearly a minute after stopping. With the queue,
> > only the most recent write is eventually sent to the MCU. This issue
> > also affects the Go 2 driver as well, though not to the same extent,
> > but for which I'll be adding a similar de-bounce queue soon. Go S is
> > also technically affected by this bug, but that returns quickly enough
> > that it isn't really feasible to trigger the bug with much frequency.
> > I'll still fix that one as well though.
> >
> > TBH I'm not "happy" with the complexity of the driver, but I don't see
> > a reasonable alternative. If you have any specific suggestions that I
> > could try that might simplify it, I'd be more than happy to give it a
> > shot. That being said, I'm not very optimistic about it. Development
> > on this device has been like wrestling a bear.
>
> Thank you for this detailed explanation. I would like to concentrate on
> the #1 first. What happens in the driver is you are essentially rolling
> asynchronous probing and asynchronous suspend/resume in the driver
> itself, and end up fighting with the kernel and the driver core
> specifically.
>
> As far as suspend/resume goes: HID subsystem already enables
> asynchronous resume handling (checkout the call to
> device_enable_async_suspend() in
> drivers/hid/hid-core.c::hid_allocate_device()). Therefore I think you
> just need to stick the necessary delay in your resume method() and call
> it a day.
>
> For the probing I would look into annotating the driver as
> PROBE_PREFER_ASYNCHRONOUS and relying on that. Again, if you stick the
> required delay in probe then sysfs attributes will not be created too
> early, same for the rest of concerns with the device being exposed to
> userspace before it is ready to handle requests.

Hi Dmitry,

I've done a bit of a deep dive into this to try and get it to work
without work queues while using PROBE_PREFER_ASYNCHRONOUS and regular
sleeps. While it does simplify the driver, I can't get it to function
properly. No matter how long I wait the probe fails with an -EBUSY as
raw_event never fires during the probe context, and my completion
handler times out. The most likely explanation I have found is that
the MCU must require all interfaces to be fully bound before it will
respond to interrupt URB completions (vice needing to sleep for 500ms
for the MCU to be ready as I previously thought). During probe the
driver core holds the parent USB device lock for the duration of each
interface's probe. Since the other interfaces cannot start binding as
they are blocked waiting for the parent lock, and the MCU is waiting
for them to bind before responding, no ACK arrives within the timeout
window and the probe fails. The other interfaces only begin probing
after the config interface probe returns and releases the parent lock,
as confirmed by dmesg timestamps. When using the work_queue method all
interfaces have finished probing, so the MCU is responding to
interrupt URB completions and everything works.

> If there are issues with HID subsystem honoring
> PROBE_PREFER_ASYNCHRONOUS I would look into fixing the subsystem rather
> than try to work around it in the driver.

Since a lock is held on the parent while each interface is attached,
and USB devices have multiple interfaces, this happens synchronously
even in "async" context. The HID core calls driver_attach, which
detects the async flag and calls __driver_attach_async_helper, which
then calls driver_probe_device after locking the parent.

>From drivers/base/dd.c:
static void __driver_attach_async_helper(void *_dev, async_cookie_t cookie)
{
struct device *dev = _dev;
const struct device_driver *drv;
int ret;

__device_driver_lock(dev, dev->parent);
drv = dev->p->async_driver;
dev->p->async_driver = NULL;
ret = driver_probe_device(drv, dev);
__device_driver_unlock(dev, dev->parent);

dev_dbg(dev, "driver %s async attach completed: %d\n", drv->name, ret);

put_device(dev);
}
...
* driver_probe_device - attempt to bind device & driver together
...
* This function must be called with @dev lock held. When called for a
* USB interface, @dev->parent lock must be held as well.

One possible solution might be to add a way to match on
bInterfaceNumber in hid_device_id and allow the interfaces I don't
care about to stay bound to hid_generic. Another would be to
parallelize the interface probing. I'm not sure about the feasibility
of either approach since they would have significant changes and broad
implications for all USB/USBHID drivers. I also don't know if that
would even solve the problem.

Thanks,
Derek

> Thanks.
>
> --
> Dmitry