Re: [PATCH 0/3] cdev: Generic shutdown handling

From: Dan Williams
Date: Sat Jan 30 2021 - 04:12:01 EST


On Wed, Jan 20, 2021 at 11:38 AM Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
>
> After reviewing driver submissions with new cdev + ioctl usages one
> common stumbling block is coordinating the shutdown of the ioctl path,
> or other file operations, at driver ->remove() time. While cdev_del()
> guarantees that no new file descriptors will be established, operations
> on existing file descriptors can proceed indefinitely.
>
> Given the observation that the kernel spends the resources for a percpu_ref
> per request_queue shared with all block_devices on a gendisk, do the
> same for all the cdev instances that share the same
> cdev_add()-to-cdev_del() lifetime.
>
> With this in place cdev_del() not only guarantees 'no new opens', but it
> also guarantees 'no new operations invocations' and 'all threads running
> in an operation handler have exited that handler'.

Prompted by the reaction I realized that this is pushing an incomplete
story about why this is needed, and the "queued" concept is way off
base. The problem this is trying to solve is situations like this:

long xyz_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
xyz_ioctl_dev = file->private_data;
xyz_driver_context = xyz_ioctl_dev->context;
...
}

int xyz_probe(struct device *dev)
{
xyz_driver_context = devm_kzalloc(...);
...
xyz_ioctl_dev = kmalloc(...);
device_initialize(&xyz_ioctl_dev->dev);
xyz_ioctl_dev->context = xyz_driver_context;
...
cdev_device_add(&xyz_ioctl_dev->cdev, xyz_ioctl_dev->dev);
}

...where a parent driver allocates context tied to the lifetime of the
parent device driver-bind-lifetime, and that context ends up getting
used in the ioctl path. I interpret Greg's assertion "if you do this
right you don't have this problem" as "don't reference anything with a
lifetime shorter than the xyz_ioctl_dev lifetime in your ioctl
handler". That is true, but it can be awkward to constraint
xyz_driver_context to a sub-device, and it constrains some of the
convenience of devm. So the goal is to have a cdev api that accounts
for all the common lifetimes when devm is in use. So I'm now thinking
of an api like:

devm_cdev_device_add(struct device *host, struct cdev *cdev,
struct device *dev)

...where @host bounds the lifetime of data used by the cdev
file_operations, and @dev is the typical containing structure for
@cdev. Internally I would refactor the debugfs mechanism for flushing
in-flight file_operations so that is shared by the cdev
implementation. Either adopt the debugfs method for file_operations
syncing, or switch debugfs to percpu_ref (leaning towards the former).

Does this clarify the raised concerns?