Re: [PATCH v5] watchdog: add driver for StreamLabs USB watchdog device
From: Guenter Roeck
Date: Mon Jul 25 2022 - 23:24:36 EST
On Tue, Jul 26, 2022 at 02:31:07AM +0100, Alexey Klimov wrote:
> On Mon, Jul 25, 2022 at 3:02 PM Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
> >
> > On 7/24/22 20:06, Alexey Klimov wrote:
>
> [...]
>
> > > + * one buffer is used for communication, however transmitted message is only
> > > + * 32 bytes long
> > > + */
> > > +#define BUFFER_TRANSFER_LENGTH 32
> > > +#define BUFFER_LENGTH 64
> > > +#define USB_TIMEOUT 350
> > > +
> > Comment about the unit (ms) might be useful.
>
> Yes. I'll add it.
>
> > > +#define STREAMLABS_CMD_START 0xaacc
> > > +#define STREAMLABS_CMD_STOP 0xbbff
> > > +
> > > +/* timeout values are taken from windows program */
> > > +#define STREAMLABS_WDT_MIN_TIMEOUT 1
> > > +#define STREAMLABS_WDT_MAX_TIMEOUT 46
> > > +
> > > +struct streamlabs_wdt {
> > > + struct watchdog_device wdt_dev;
> > > + struct usb_interface *intf;
> > > + /* Serialises usb communication with a device */
> > > + struct mutex lock;
> > > + __le16 *buffer;
> > > +};
> > > +
> > > +static bool nowayout = WATCHDOG_NOWAYOUT;
> > > +module_param(nowayout, bool, 0);
> > > +MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once started (default="
> > > + __MODULE_STRING(WATCHDOG_NOWAYOUT) ")");
> > > +
> > > +/* USB call wrappers to send and receive messages to/from the device. */
> > > +static int usb_streamlabs_send_msg(struct usb_device *usbdev, __le16 *buf)
> > > +{
> > > + int retval;
> > > + int size;
> > > +
> > > + retval = usb_interrupt_msg(usbdev, usb_sndintpipe(usbdev, 0x02),
> > > + buf, BUFFER_TRANSFER_LENGTH,
> > > + &size, USB_TIMEOUT);
> > > +
> > > + if (size != BUFFER_TRANSFER_LENGTH)
> > > + return -EIO;
> > > +
> >
> > If usb_interrupt_msg() returns an error, it will likely not set size,
> > which may result in a random -EIO. I think this should be something like
> >
> > if (retval)
> > return retval;
> > if (size != BUFFER_TRANSFER_LENGTH)
> > return -EIO;
> >
> > return 0;
>
> Good point. I'll change it.
>
>
> > > + return retval;
> > > +}
> > > +
> > > +static int usb_streamlabs_get_msg(struct usb_device *usbdev, __le16 *buf)
> > > +{
> > > + int retval;
> > > + int size;
> > > +
> > > + retval = usb_interrupt_msg(usbdev, usb_rcvintpipe(usbdev, 0x81),
> > > + buf, BUFFER_LENGTH,
> > > + &size, USB_TIMEOUT);
> > > +
> > > + if (size != BUFFER_LENGTH)
> > > + return -EIO;
> > > +
> > Same here.
> >
> > > + return retval;
> > > +}
> > > +
> > > +/*
> > > + * This function is used to check if watchdog timeout in the received buffer
> > > + * matches the timeout passed from watchdog subsystem.
> > > + */
> > > +static int usb_streamlabs_wdt_check_timeout(__le16 *buf, unsigned long timeout)
> > > +{
> > > + if (buf[3] != cpu_to_le16(timeout))
> > > + return -EPROTO;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int usb_streamlabs_wdt_check_response(u8 *buf)
> > > +{
> > > + /*
> > > + * If watchdog device understood the command it will acknowledge
> > > + * with values 1,2,3,4 at indexes 10, 11, 12, 13 in response message
> > > + * when response treated as 8bit message.
> > > + */
> > > + if (buf[10] != 1 || buf[11] != 2 || buf[12] != 3 || buf[13] != 4)
> > > + return -EPROTO;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +/*
> > > + * This function is used to check if watchdog command in the received buffer
> > > + * matches the command passed to the device.
> > > + */
> > > +static int usb_streamlabs_wdt_check_command(__le16 *buf, u16 cmd)
> > > +{
> > > + if (buf[0] != cpu_to_le16(cmd))
> > > + return -EPROTO;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int usb_streamlabs_wdt_validate_response(__le16 *buf, u16 cmd,
> > > + unsigned long timeout_msec)
> > > +{
> > > + int retval;
> > > +
> > > + retval = usb_streamlabs_wdt_check_response((u8 *)buf);
> > > + if (retval)
> > > + return retval;
> > > +
> > > + retval = usb_streamlabs_wdt_check_command(buf, cmd);
> > > + if (retval)
> > > + return retval;
> > > +
> > > + retval = usb_streamlabs_wdt_check_timeout(buf, timeout_msec);
> > > + return retval;
> >
> > assignment to retval is unnecessary.
>
> Ok.
>
> > > +}
> > > +
> > > +static void usb_streamlabs_wdt_prepare_buf(__le16 *buf, u16 cmd,
> > > + unsigned long timeout_msec)
> > > +{
> > > + /*
> > > + * remaining elements expected to be zero everytime during
> > > + * communication
> > > + */
> > > + buf[0] = cpu_to_le16(cmd);
> > > + buf[1] = cpu_to_le16(0x8000);
> > > + buf[3] = cpu_to_le16(timeout_msec);
> >
> > Not setting buf[2] and buf[4] contradicts the comment above. Maybe
> > those offsets are not _expected_ to be set by the device, but that
> > is not guaranteed. It may be safer to just use memset() at the
> > beginning of the function to clear the buffer.
>
> Sure. I guess it makes sense to zero the buffer before reading the
> message from the device too?
> Before usb_streamlabs_get_msg(usbdev, wdt->buffer).
>
That should not be necessary unless your code accesses data beyond the
amount of data received (which it should not do in the first place).
> > > + buf[5] = 0x0;
> > > + buf[6] = 0x0;
> > > +}
> > > +
> > > +static int __usb_streamlabs_wdt_cmd(struct streamlabs_wdt *wdt, u16 cmd)
> > > +{
> > > + struct usb_device *usbdev;
> > > + unsigned long timeout_msec;
> > > + /* how many times to re-try getting the state of the device */
> > > + unsigned int retry_counter = 10;
> > > + int retval;
> > > +
> > > + if (unlikely(!wdt->intf))
> > > + return -ENODEV;
> > > +
> > > + usbdev = interface_to_usbdev(wdt->intf);
> > > + timeout_msec = wdt->wdt_dev.timeout * MSEC_PER_SEC;
> > > +
> > > + usb_streamlabs_wdt_prepare_buf(wdt->buffer, cmd, timeout_msec);
> > > +
> > > + /* send command to watchdog */
> > > + retval = usb_streamlabs_send_msg(usbdev, wdt->buffer);
> > > + if (retval)
> > > + return retval;
> > > +
> > > + /*
> > > + * Transition from one state to another in this device
> > > + * doesn't happen immediately, especially stopping the device
> > > + * is not observed on the first reading of the response.
> > > + * Plus to avoid potentially stale response message in the device
> > > + * we keep reading the state of the device until we see:
> > > + * -- that device recognised the sent command;
> > > + * -- the received state (started or stopped) matches the state
> > > + * that was requested;
> > > + * -- the timeout passed matches the timeout value read from
> > > + * the device.
> > > + * Keep retrying 10 times and if watchdog device doesn't respond
> > > + * correctly as expected we bail out and return an error.
> > > + */
> > > + do {
> > > + retval = usb_streamlabs_get_msg(usbdev, wdt->buffer);
> > > + if (retval)
> > > + break;
> > > +
> > > + retval = usb_streamlabs_wdt_validate_response(wdt->buffer, cmd,
> > > + timeout_msec);
> > > + } while (retval && retry_counter--);
> > > +
> > > + return retry_counter ? retval : -EIO;
> > > +}
> > > +
> > > +static int usb_streamlabs_wdt_cmd(struct streamlabs_wdt *streamlabs_wdt, u16 cmd)
> > > +{
> > > + int retval;
> > > +
> > > + mutex_lock(&streamlabs_wdt->lock);
> > > + retval = __usb_streamlabs_wdt_cmd(streamlabs_wdt, cmd);
> > > + mutex_unlock(&streamlabs_wdt->lock);
> > > +
> > > + return retval;
> > > +}
> > > +
> > > +static int usb_streamlabs_wdt_start(struct watchdog_device *wdt_dev)
> > > +{
> > > + struct streamlabs_wdt *streamlabs_wdt = watchdog_get_drvdata(wdt_dev);
> > > +
> > > + return usb_streamlabs_wdt_cmd(streamlabs_wdt, STREAMLABS_CMD_START);
> > > +}
> > > +
> > > +static int usb_streamlabs_wdt_stop(struct watchdog_device *wdt_dev)
> > > +{
> > > + struct streamlabs_wdt *streamlabs_wdt = watchdog_get_drvdata(wdt_dev);
> > > +
> > > + return usb_streamlabs_wdt_cmd(streamlabs_wdt, STREAMLABS_CMD_STOP);
> > > +}
> > > +
> > > +static const struct watchdog_info streamlabs_wdt_ident = {
> > > + .options = WDIOF_SETTIMEOUT | WDIOF_KEEPALIVEPING,
> > > + .identity = DRIVER_NAME,
> > > +};
> > > +
> > > +static const struct watchdog_ops usb_streamlabs_wdt_ops = {
> > > + .owner = THIS_MODULE,
> > > + .start = usb_streamlabs_wdt_start,
> > > + .stop = usb_streamlabs_wdt_stop,
> > > +};
> > > +
> > > +static int usb_streamlabs_wdt_probe(struct usb_interface *intf,
> > > + const struct usb_device_id *id)
> > > +{
> > > + struct usb_device *usbdev = interface_to_usbdev(intf);
> > > + struct streamlabs_wdt *streamlabs_wdt;
> > > + int retval;
> > > +
> > > + /*
> > > + * USB IDs of this device appear to be weird/unregistered. Hence, do
> > > + * an additional check on product and manufacturer.
> > > + * If there is similar device in the field with same values then
> > > + * there is stop command in probe() below that checks if the device
> > > + * behaves as a watchdog.
> > > + */
> > > + if (!usbdev->product || !usbdev->manufacturer ||
> > > + strncmp(usbdev->product, "USBkit", 6) ||
> > > + strncmp(usbdev->manufacturer, "STREAM LABS", 11))
> > > + return -ENODEV;
> > > +
> > > + streamlabs_wdt = devm_kzalloc(&intf->dev, sizeof(struct streamlabs_wdt),
> > > + GFP_KERNEL);
> > > + if (!streamlabs_wdt)
> > > + return -ENOMEM;
> > > +
> > > + streamlabs_wdt->buffer = devm_kzalloc(&intf->dev, BUFFER_LENGTH,
> > > + GFP_KERNEL);
> > > + if (!streamlabs_wdt->buffer)
> > > + return -ENOMEM;
> > > +
> >
> > Nit: buffer could be made part of struct streamlabs_wdt and be tagged with
> > ____cacheline_aligned to avoid the double allocation.
>
> It was discussed in the past.
> https://lore.kernel.org/linux-watchdog/5714E7D3.4030809@xxxxxxxxxxxx/
> https://lore.kernel.org/linux-watchdog/1460988518.25119.28.camel@xxxxxxxx/
>
> The conclusion there was that with separate allocation it is
> guaranteed to not share a cacheline with mutex lock.
> Do we know for sure that it is safe with ____cacheline_aligned attribute?
>
> Oliver, thoughts?
>
> I see that a lot of drivers use cacheline alignment for buffers, so I
> guess that should be okay nowadays and I can change it back to initial
> version with cacheline alignment.
>
If you don't trust __cacheline_aligned, please add a respective comment
explaining that to avoid this from coming up again.
> > > + mutex_init(&streamlabs_wdt->lock);
> > > +
> > > + streamlabs_wdt->wdt_dev.info = &streamlabs_wdt_ident;
> > > + streamlabs_wdt->wdt_dev.ops = &usb_streamlabs_wdt_ops;
> > > + streamlabs_wdt->wdt_dev.timeout = STREAMLABS_WDT_MAX_TIMEOUT;
> > > + streamlabs_wdt->wdt_dev.max_timeout = STREAMLABS_WDT_MAX_TIMEOUT;
> > > + streamlabs_wdt->wdt_dev.min_timeout = STREAMLABS_WDT_MIN_TIMEOUT;
> > > + streamlabs_wdt->wdt_dev.parent = &intf->dev;
> > > +
> > > + streamlabs_wdt->intf = intf;
> > > + usb_set_intfdata(intf, &streamlabs_wdt->wdt_dev);
> > > + watchdog_set_drvdata(&streamlabs_wdt->wdt_dev, streamlabs_wdt);
> > > + watchdog_set_nowayout(&streamlabs_wdt->wdt_dev, nowayout);
> > > +
> > > + retval = usb_streamlabs_wdt_stop(&streamlabs_wdt->wdt_dev);
> > > + if (retval)
> > > + return -ENODEV;
> > > +
> >
> > A comment explaining why the watchdog is explicitly stopped when running
> > might be useful.
>
> What do you mean by saying "when running"?
> Everytime during my testing the initial state is "stopped" on
> boot/power on/after reset, so not sure what you mean by saying "when
> running".
Should I have used the term "active" ? "started" ?
> There is a comment above that explains the stop command but I will
> add/change comments that explain things better.
> The point of executing "stop" command here is to check that device
> being probed behaves like we expect it to. This is a bit paranoid
> check since I am a not 100% sure that all devices with such USB ids
> are watchdogs -- that's why additional checks for usbdev->product and
> usbdev->manufacturer and this stop command that doesn't change the
> initial state. In theory, I can remove this stop command at all.
>
Normally one does not want a watchdog to stop if it is running (started ?
active ? pick your term) when the device is instantiated to avoid gaps
in watchdog coverage. The watchdog core provides a flag, WDOG_HW_RUNNING,
for this purpose (sorry, there is the 'running' term again). It is used
by many drivers, and ensures that the time from loading the Linux kernel
to opening the watchdog device is protected by the watchdog.
Calling the stop function during probe suggests that you at least
considered the possibility that the watchdog may be running/started/active.
Per your explanation, you still want it to stop explicitly if/when that
happens. All I am asking you is to add a comment explaining why this is
not needed/wanted/relevant/supported for this driver. One explanation
might, for example, be that the state can not be detected reliably.
Thanks,
Guenter
> Thank you for the review.
>
> [...]
>
> Best regards,
> Alexey