Re: [PATCH v8 04/20] dlb: add device ioctl layer and first three ioctls

From: Dan Williams
Date: Thu Jan 21 2021 - 18:15:39 EST


On Wed, Jan 13, 2021 at 1:56 AM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
[..]
> > That's not my concern though. The open race that cdev_del() does not
> > address is ioctl() called after device-unbind. The open fd is never
> > revoked and can live past device_unregister() in which case the ioctl
> > needs to revalidate the device, or (not recommended) block unbind /
> > driver-remove while open file descriptors are outstanding.
>
> A cdev is to track the character device, so the open/close/ioctl issue
> should not be relevant here.
>
> Device unbind is totally different and has nothing to do with the
> character device node, except where you are trying to tie the two
> together. And yes, you do have to be aware of that, but usually is it
> quite simple. Complex examples are the v4l layer where the distance
> between the two devices is great, so the middle layer has to handle
> things carefully.
>
> For a "simple" driver like this one, there shouldn't be any issues and
> it should be hard to get this wrong :)

It's trivial to trigger these problems in a bind/unbind test.

>
> > > > It strikes me that a helper like this might address many of the common patterns:
> > > >
> > > > +/**
> > > > + * get_live_device() - increment reference count for device iff !dead
> > > > + * @dev: device.
> > > > + *
> > > > + * Forward the call to get_device() if the device is still alive. If
> > > > + * this is called with the device_lock() held then the device is
> > > > + * guaranteed to not die until the device_lock() is dropped.
> > > > + */
> > > > +struct device *get_live_device(struct device *dev)
> > > > +{
> > > > + return dev && !dev->p->dead ? get_device(dev) : NULL;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(get_live_device);
> > >
> > > Ick, no, that's still racy :(
> >
> > Hence the comment about needing to synchronize with the driver doing
> > device_unregister().
>
> If you save off your device pointer properly on probe (i.e. you grab a
> reference count as you "know" it is live), then you don't have any of
> these races or need to synchronize. So again, this should be hard to
> get wrong, unless you have a "heavy" middle layer between the char
> device node and the device structure.

Hold an fd open and continue to issue file_operations while the driver
is torn down.

> > > And a cdev is not a "real" struct device, despite looking like one if
> > > you squint at it. The patches from Christoph should be merged soon that
> > > remove the last remants of the logic that kind of looked like a struct
> > > device operation (with a kobject), and after that, I will clean it out
> > > to keep it from looking like it ties into the driver model entirely, as
> > > many people get this wrong, because it does not.
> >
> > Agree, but many drivers still tie cdev lifetime to a struct device.
>
> True, and that's normally fine. Do you have examples of where this is
> wrong in the tree?
>
> > > > Alternatively, if device_lock() is too awkward for a driver it could
> > > > use its own lock and kill_device().
> > > >
> > > > ...am I off base thinking that cdev_del vs fops liveness is a
> > > > widespread problem worth documenting new design patterns?
> > >
> > > It shouldn't be a problem, again, because who would be able to close a
> > > char device node and still be able to call ioctl on it? The VFS layer
> > > should prevent that from happening, right?
> >
> > Per above, unbind vs and revoking new ioctl() invocations is the concern.
>
> Luckily unbind is a very rare occurance (with the exception of the
> virtio drivers, who seem to love it and use it as their only user/kernel
> api), so this shouldn't be an issue in normal operations of a system.

I agree it's a small window and an infrequently stressed window under
normal conditions. However, it's the same kind of problem that lead to
the need to teach /dev/mem to revoke existing mappings. It's the
synchronization that is needed in the device-dax layer when it is
switching modes from device-mmap-only to online-memory-hotplug the
synchronization with in-flight mmap is needed to avoid collisions. The
libnvdimm subsystem is a heavy user of bind/unbind for applying
different personalities in front of pmem.

I'll advise Mike to not worry about it, I don't want the resolution of
this debate to slow him down if you don't see an issue here.