Re: [PATCH] platform/chrome: cros_ec_proto: Lock device when updating MKBP version

From: Patryk Duda
Date: Tue Jul 30 2024 - 04:05:38 EST


On Tue, Jul 30, 2024 at 8:04 AM Tzung-Bi Shih <tzungbi@xxxxxxxxxx> wrote:
>
> On Mon, Jul 29, 2024 at 01:57:09PM +0200, Patryk Duda wrote:
> > On Mon, Jul 29, 2024 at 5:47 AM Tzung-Bi Shih <tzungbi@xxxxxxxxxx> wrote:
> > >
> > > On Thu, Jul 25, 2024 at 05:57:13PM +0000, Patryk Duda wrote:
> > > > The cros_ec_get_host_command_version_mask() function requires that the
> > > > caller must have ec_dev->lock mutex before calling it. This requirement
> > > > was not met and as a result it was possible that two commands were sent
> > > > to the device at the same time.
> > >
> > > To clarify:
> > > - What would happen if multiple cros_ec_get_host_command_version_mask() calls
> > > at the same time?
> > In the best case, MCU will receive both commands glued together and
> > will ignore them.
> > It will result in a timeout in the kernel. In the worst case, request
> > and/or response buffers will be
> > corrupted.
> >
> > > - What are the callees? I'm trying to understand the source of parallelism.
> > This is a race between interrupt handling and ioctl call from userspace
> >
> > Handling interrupt path
> > cros_ec_irq_thread()
> > cros_ec_handle_event()
> > cros_ec_get_next_event() - Queries host command version without taking
> > ec_dev->lock mutex first
> > cros_ec_get_host_command_version_mask()
> > cros_ec_send_command()
> > cros_ec_xfer_command()
> > cros_ec_uart_pkt_xfer()
> >
> > Command from userspace
> > cros_ec_chardev_ioctl()
> > cros_ec_chardev_ioctl_xcmd()
> > cros_ec_cmd_xfer() - Locks ec_dev->lock mutex before sending command
> > cros_ec_send_command()
> > cros_ec_xfer_command()
> > cros_ec_uart_pkt_xfer()
> >
> > >
> > > Also, the patch also needs an unlock at [1].
> > >
> > > [1]: https://elixir.bootlin.com/linux/v6.10/source/drivers/platform/chrome/cros_ec_proto.c#L819
> >
> > Yeah. I'll fix it in v2
>
> I'm wondering if it's simpler to just lock and unlock around calling
> cros_ec_get_host_command_version_mask(). What do you think?
>
Initially, I thought it would be good to keep ec_dev->mkbp_event_supported
update under the mutex (similar to cros_ec_query_all() which is called with
locked mutex), but mkbp_event_supported is also used without locked mutex.

I don't see any obvious risks with updating the MKBP version outside mutex.
Do you want me to change it?

> > > > The problem was observed while using UART backend which doesn't use any
> > > > additional locks, unlike SPI backend which locks the controller until
> > > > response is received.
> > >
> > > Is it a general issue if multiple commands send to EC at a time? If yes, it
> > > should serialize that in the UART transportation.
> >
> > Host Commands only support one command at a time. It's enforced by 'lock' mutex
> > from cros_ec_device structure. We just need to use it properly.
>
> I see. Please use the fixes tag if you'd have chance to send next version:
> Fixes: f74c7557ed0d ("platform/chrome: cros_ec_proto: Update version on GET_NEXT_EVENT failure")