Re: [PATCH v12 10/17] counter: Add character device interface

From: William Breathitt Gray
Date: Mon Jul 12 2021 - 06:29:11 EST


On Sun, Jul 11, 2021 at 01:20:03PM +0100, Jonathan Cameron wrote:
> On Mon, 5 Jul 2021 17:18:58 +0900
> William Breathitt Gray <vilhelm.gray@xxxxxxxxx> wrote:
>
> > This patch introduces a character device interface for the Counter
> > subsystem. Device data is exposed through standard character device read
> > operations. Device data is gathered when a Counter event is pushed by
> > the respective Counter device driver. Configuration is handled via ioctl
> > operations on the respective Counter character device node.
> >
> > Cc: David Lechner <david@xxxxxxxxxxxxxx>
> > Cc: Gwendal Grignou <gwendal@xxxxxxxxxxxx>
> > Cc: Dan Carpenter <dan.carpenter@xxxxxxxxxx>
> > Cc: Oleksij Rempel <o.rempel@xxxxxxxxxxxxxx>
> > Signed-off-by: William Breathitt Gray <vilhelm.gray@xxxxxxxxx>
> > ---
> > drivers/counter/Makefile | 2 +-
> > drivers/counter/counter-chrdev.c | 494 +++++++++++++++++++++++++++++++
> > drivers/counter/counter-chrdev.h | 14 +
> > drivers/counter/counter-core.c | 44 ++-
> > include/linux/counter.h | 45 +++
> > include/uapi/linux/counter.h | 77 +++++
> > 6 files changed, 670 insertions(+), 6 deletions(-)
> > create mode 100644 drivers/counter/counter-chrdev.c
> > create mode 100644 drivers/counter/counter-chrdev.h
> >
> > diff --git a/drivers/counter/Makefile b/drivers/counter/Makefile
> > index 1ab7e087fdc2..8fde6c100ebc 100644
> > --- a/drivers/counter/Makefile
> > +++ b/drivers/counter/Makefile
> > @@ -4,7 +4,7 @@
> > #
> >
> > obj-$(CONFIG_COUNTER) += counter.o
> > -counter-y := counter-core.o counter-sysfs.o
> > +counter-y := counter-core.o counter-sysfs.o counter-chrdev.o
> >
> > obj-$(CONFIG_104_QUAD_8) += 104-quad-8.o
> > obj-$(CONFIG_INTERRUPT_CNT) += interrupt-cnt.o
> > diff --git a/drivers/counter/counter-chrdev.c b/drivers/counter/counter-chrdev.c
> > new file mode 100644
> > index 000000000000..92805b1f65b8
> > --- /dev/null
> > +++ b/drivers/counter/counter-chrdev.c
> > @@ -0,0 +1,494 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Generic Counter character device interface
> > + * Copyright (C) 2020 William Breathitt Gray
> > + */
> > +
> > +#include <linux/cdev.h>
> > +#include <linux/counter.h>
> > +#include <linux/err.h>
> > +#include <linux/errno.h>
> > +#include <linux/export.h>
> > +#include <linux/fs.h>
> > +#include <linux/kfifo.h>
> > +#include <linux/list.h>
> > +#include <linux/mutex.h>
> > +#include <linux/nospec.h>
> > +#include <linux/poll.h>
> > +#include <linux/slab.h>
> > +#include <linux/spinlock.h>
> > +#include <linux/timekeeping.h>
> > +#include <linux/types.h>
> > +#include <linux/wait.h>
> > +#include <linux/uaccess.h>
> > +
> > +#include "counter-chrdev.h"
> > +
> > +struct counter_comp_node {
> > + struct list_head l;
> > + struct counter_component component;
> > + struct counter_comp comp;
> > + void *parent;
> > +};
> > +
> > +static ssize_t counter_chrdev_read(struct file *filp, char __user *buf,
> > + size_t len, loff_t *f_ps)
> > +{
> > + struct counter_device *const counter = filp->private_data;
> > + int err;
> > + unsigned int copied;
> > +
> > + if (len < sizeof(struct counter_event))
> > + return -EINVAL;
>
> There is a lot of discussion going on buried in a rust on linux thread
> around the use of devm when chardevs are involved. I'd kind of forgotten
> that Lars-Peter Clausen did a lot of work to make IIO safe to these races
> some time ago. One of those elements was to make we dropped out quickly
> from read functions if we were on the way 'down'. Could you make sure to
> run some tests to ensure we are safe with driver unbinds when the cdev is
> still open? Another part of that was to ensure a blocking read unblocks
> when the device goes away (with an error of course!) Some of this stuff
> isn't 'necessary' for correctness, but it is desirable for device removal
> to occur in finite time.
>
> https://lore.kernel.org/ksummit/CANiq72nkNrekzbxMci6vW02w=Q2L-SVTk_U4KN_LT8u_b=YPgw@xxxxxxxxxxxxxx/T/#m6db86a574237c22a32ecf49b596b3c2917967c5e
>
> Note I want to take another look at the IIO code around this as well
> just in case we missed anything that has come up in that discussion.
> I think we are fine but maybe can move to more 'standard' code patterns
> if those get formalised.
>
> Anyhow, it's fiddly stuff, so make sure to test those cases.
>
> Jonathan

Thank you for the heads-up. It'll be a shame if all the simplification
we got by using devm_* for counter-sysfs.c has to be undone, but I'll
investigate this first and see if we're all right as we are now; I also
suspect we are fine given how many times we've already tested these
patchset revisions, but I'll run some deliberate tests for these
particular cases just to be sure.

> > +
> > + do {
> > + if (kfifo_is_empty(&counter->events)) {
> > + if (filp->f_flags & O_NONBLOCK)
> > + return -EAGAIN;
> > +
> > + err = wait_event_interruptible(counter->events_wait,
> > + !kfifo_is_empty(&counter->events));
> > + if (err < 0)
> > + return err;
> > + }
> > +
> > + if (mutex_lock_interruptible(&counter->events_lock))
> > + return -ERESTARTSYS;
> > + err = kfifo_to_user(&counter->events, buf, len, &copied);
> > + mutex_unlock(&counter->events_lock);
> > + if (err < 0)
> > + return err;
> > + } while (!copied);
> > +
> > + return copied;
> > +}
> > +
> > +static __poll_t counter_chrdev_poll(struct file *filp,
> > + struct poll_table_struct *pollt)
> > +{
> > + struct counter_device *const counter = filp->private_data;
> > + __poll_t events = 0;
> > +
> > + poll_wait(filp, &counter->events_wait, pollt);
> > +
> > + if (!kfifo_is_empty(&counter->events))
> > + events = EPOLLIN | EPOLLRDNORM;
> > +
> > + return events;
> > +}
> > +
> > +static void counter_events_list_free(struct list_head *const events_list)
> > +{
> > + struct counter_event_node *p, *n;
> > + struct counter_comp_node *q, *o;
> > +
> > + list_for_each_entry_safe(p, n, events_list, l) {
> > + /* Free associated component nodes */
> > + list_for_each_entry_safe(q, o, &p->comp_list, l) {
> > + list_del(&q->l);
> > + kfree(q);
> > + }
> > +
> > + /* Free event node */
> > + list_del(&p->l);
> > + kfree(p);
> > + }
> > +}
> > +
> > +static int counter_set_event_node(struct counter_device *const counter,
> > + struct counter_watch *const watch,
> > + const struct counter_comp_node *const cfg)
> > +{
> > + unsigned long flags;
> > + struct counter_event_node *event_node;
> > + int err = 0;
> > + struct counter_comp_node *comp_node;
> > +
> > + spin_lock_irqsave(&counter->events_list_lock, flags);
> > +
> > + /* Search for event in the list */
> > + list_for_each_entry(event_node, &counter->next_events_list, l)
> > + if (event_node->event == watch->event &&
> > + event_node->channel == watch->channel)
> > + break;
> > +
> > + /* If event is not already in the list */
> > + if (&event_node->l == &counter->next_events_list) {
> > + /* Allocate new event node */
> > + event_node = kmalloc(sizeof(*event_node), GFP_ATOMIC);
> > + if (!event_node) {
> > + err = -ENOMEM;
> > + goto exit_early;
> > + }
> > +
> > + /* Configure event node and add to the list */
> > + event_node->event = watch->event;
> > + event_node->channel = watch->channel;
> > + INIT_LIST_HEAD(&event_node->comp_list);
> > + list_add(&event_node->l, &counter->next_events_list);
> > + }
> > +
> > + /* Check if component watch has already been set before */
> > + list_for_each_entry(comp_node, &event_node->comp_list, l)
> > + if (comp_node->parent == cfg->parent &&
> > + comp_node->comp.count_u8_read == cfg->comp.count_u8_read) {
> > + err = -EINVAL;
> > + goto exit_early;
> > + }
> > +
> > + /* Allocate component node */
> > + comp_node = kmalloc(sizeof(*comp_node), GFP_ATOMIC);
> > + if (!comp_node) {
> > + /* Free event node if no one else is watching */
> > + if (list_empty(&event_node->comp_list)) {
> > + list_del(&event_node->l);
> > + kfree(event_node);
> > + }
> > + err = -ENOMEM;
> > + goto exit_early;
> > + }
> > + *comp_node = *cfg;
> > +
> > + /* Add component node to event node */
> > + list_add_tail(&comp_node->l, &event_node->comp_list);
> > +
> > +exit_early:
> > + spin_unlock_irqrestore(&counter->events_list_lock, flags);
> > +
> > + return err;
> > +}
> > +
> > +static int counter_disable_events(struct counter_device *const counter)
> > +{
> > + unsigned long flags;
> > + int err = 0;
> > +
> > + spin_lock_irqsave(&counter->events_list_lock, flags);
> > +
> > + counter_events_list_free(&counter->events_list);
> > +
> > + if (counter->ops->events_configure)
> > + err = counter->ops->events_configure(counter);
> > +
> > + spin_unlock_irqrestore(&counter->events_list_lock, flags);
> > +
> > + counter_events_list_free(&counter->next_events_list);
> > +
> > + return err;
> > +}
> > +
> > +static int counter_add_watch(struct counter_device *const counter,
> > + const unsigned long arg)
> > +{
> > + void __user *const uwatch = (void __user *)arg;
> > + struct counter_watch watch;
> > + struct counter_comp_node comp_node = {};
> > + size_t parent, id;
> > + struct counter_comp *ext;
> > + size_t num_ext;
> > + int err;
> > +
> > + if (copy_from_user(&watch, uwatch, sizeof(watch)))
> > + return -EFAULT;
> > +
> > + if (watch.component.type == COUNTER_COMPONENT_NONE)
> > + goto no_component;
> > +
> > + parent = watch.component.parent;
> > +
> > + /* Configure parent component info for comp node */
> > + switch (watch.component.scope) {
> > + case COUNTER_SCOPE_DEVICE:
> > + ext = counter->ext;
> > + num_ext = counter->num_ext;
> > + break;
> > + case COUNTER_SCOPE_SIGNAL:
> > + if (parent >= counter->num_signals)
> > + return -EINVAL;
> > + parent = array_index_nospec(parent, counter->num_signals);
> > +
> > + comp_node.parent = counter->signals + parent;
> > +
> > + ext = counter->signals[parent].ext;
> > + num_ext = counter->signals[parent].num_ext;
> > + break;
> > + case COUNTER_SCOPE_COUNT:
> > + if (parent >= counter->num_counts)
> > + return -EINVAL;
> > + parent = array_index_nospec(parent, counter->num_counts);
> > +
> > + comp_node.parent = counter->counts + parent;
> > +
> > + ext = counter->counts[parent].ext;
> > + num_ext = counter->counts[parent].num_ext;
> > + break;
> > + default:
> > + return -EINVAL;
> > + }
> > +
> > + id = watch.component.id;
> > +
> > + /* Configure component info for comp node */
> > + switch (watch.component.type) {
> > + case COUNTER_COMPONENT_SIGNAL:
> > + if (watch.component.scope != COUNTER_SCOPE_SIGNAL)
> > + return -EINVAL;
> > +
> > + comp_node.comp.type = COUNTER_COMP_SIGNAL_LEVEL;
> > + comp_node.comp.signal_u32_read = counter->ops->signal_read;
> > + break;
> > + case COUNTER_COMPONENT_COUNT:
> > + if (watch.component.scope != COUNTER_SCOPE_COUNT)
> > + return -EINVAL;
> > +
> > + comp_node.comp.type = COUNTER_COMP_U64;
> > + comp_node.comp.count_u64_read = counter->ops->count_read;
> > + break;
> > + case COUNTER_COMPONENT_FUNCTION:
> > + if (watch.component.scope != COUNTER_SCOPE_COUNT)
> > + return -EINVAL;
> > +
> > + comp_node.comp.type = COUNTER_COMP_FUNCTION;
> > + comp_node.comp.count_u32_read = counter->ops->function_read;
> > + break;
> > + case COUNTER_COMPONENT_SYNAPSE_ACTION:
> > + if (watch.component.scope != COUNTER_SCOPE_COUNT)
> > + return -EINVAL;
> > + if (id >= counter->counts[parent].num_synapses)
> > + return -EINVAL;
> > + id = array_index_nospec(id, counter->counts[parent].num_synapses);
> > +
> > + comp_node.comp.type = COUNTER_COMP_SYNAPSE_ACTION;
> > + comp_node.comp.action_read = counter->ops->action_read;
> > + comp_node.comp.priv = counter->counts[parent].synapses + id;
> > + break;
> > + case COUNTER_COMPONENT_EXTENSION:
> > + if (id >= num_ext)
> > + return -EINVAL;
> > + id = array_index_nospec(id, num_ext);
> > +
> > + comp_node.comp = ext[id];
> > + break;
> > + default:
> > + return -EINVAL;
> > + }
> > + /* Check if any read callback is set; this is part of a union */
> > + if (!comp_node.comp.count_u8_read)
> > + return -EOPNOTSUPP;
> > +
> > +no_component:
> > + if (counter->ops->watch_validate) {
> > + err = counter->ops->watch_validate(counter, &watch);
> > + if (err < 0)
> > + return err;
> > + }
> > +
> > + comp_node.component = watch.component;
> > +
> > + return counter_set_event_node(counter, &watch, &comp_node);
> > +}
> > +
> > +static long counter_chrdev_ioctl(struct file *filp, unsigned int cmd,
> > + unsigned long arg)
> > +{
> > + struct counter_device *const counter = filp->private_data;
> > + unsigned long flags;
> > + int err = 0;
> > +
> > + switch (cmd) {
> > + case COUNTER_ADD_WATCH_IOCTL:
> > + return counter_add_watch(counter, arg);
> > + case COUNTER_ENABLE_EVENTS_IOCTL:
> > + spin_lock_irqsave(&counter->events_list_lock, flags);
> > +
> > + counter_events_list_free(&counter->events_list);
> > + list_replace_init(&counter->next_events_list,
> > + &counter->events_list);
> > +
> > + if (counter->ops->events_configure)
> > + err = counter->ops->events_configure(counter);
> > +
> > + spin_unlock_irqrestore(&counter->events_list_lock, flags);
> > + return err;
> > + case COUNTER_DISABLE_EVENTS_IOCTL:
> > + return counter_disable_events(counter);
> > + default:
> > + return -ENOIOCTLCMD;
> > + }
> > +}
> > +
> > +static int counter_chrdev_open(struct inode *inode, struct file *filp)
> > +{
> > + struct counter_device *const counter = container_of(inode->i_cdev,
> > + typeof(*counter),
> > + chrdev);
> > +
>
> What stops multiple simultaneous openings? I'm going to assume this isn't
> safe to those, or at least that crazy things could happen if you had it
> open twice at the same time.

We're "safe" in the sense that there are locks protecting data access
(no risk of dereferencing a NUll pointer, etc.), so I don't expect any
crashes if there are multiple opens to the chrdev. However, I coded this
with the expectation of a single open, so in such as scenario where we
have multiple opens I suspect some data would be not appear, or maybe
only appear partially, because the kfifo data is cleared when read.

In patch 15 I deliberately restrict the chrdev to a single open via
counter->chrdev_lock in order to support the events_queue_size sysfs
attribute. I could move it to this patch, but it may not be necessary
when both patches will likely be picked up together and we have no
critical issues with opening twice here.

William Breathitt Gray

Attachment: signature.asc
Description: PGP signature