Re: [PATCH v5 1/8] drivers: kunit: Generic helpers for test device creation
From: Jonathan Cameron
Date: Sat Apr 01 2023 - 11:15:46 EST
On Sun, 26 Mar 2023 10:16:54 -0700
Lars-Peter Clausen <lars@xxxxxxxxxx> wrote:
> On 3/25/23 10:50, Jonathan Cameron wrote:
> > On Fri, 24 Mar 2023 13:31:57 +0100
> > Maxime Ripard <maxime@xxxxxxxxxx> wrote:
> >
> >> On Fri, Mar 24, 2023 at 08:11:52AM +0200, Matti Vaittinen wrote:
> >>> On 3/23/23 18:36, Maxime Ripard wrote:
> >>>> On Thu, Mar 23, 2023 at 03:02:03PM +0200, Matti Vaittinen wrote:
> >>>>> On 3/23/23 14:29, Maxime Ripard wrote:
> >>>>>> On Thu, Mar 23, 2023 at 02:16:52PM +0200, Matti Vaittinen wrote:
> >>>>>>
> >>>>>> This is the description of what was happening:
> >>>>>> https://lore.kernel.org/dri-devel/20221117165311.vovrc7usy4efiytl@houat/
> >>>>> Thanks Maxime. Do I read this correcty. The devm_ unwinding not being done
> >>>>> when root_device_register() is used is not because root_device_unregister()
> >>>>> would not trigger the unwinding - but rather because DRM code on top of this
> >>>>> device keeps the refcount increased?
> >>>> There's a difference of behaviour between a root_device and any device
> >>>> with a bus: the root_device will only release the devm resources when
> >>>> it's freed (in device_release), but a bus device will also do it in
> >>>> device_del (through bus_remove_device() -> device_release_driver() ->
> >>>> device_release_driver_internal() -> __device_release_driver() ->
> >>>> device_unbind_cleanup(), which are skipped (in multiple places) if
> >>>> there's no bus and no driver attached to the device).
> >>>>
> >>>> It does affect DRM, but I'm pretty sure it will affect any framework
> >>>> that deals with device hotplugging by deferring the framework structure
> >>>> until the last (userspace) user closes its file descriptor. So I'd
> >>>> assume that v4l2 and cec at least are also affected, and most likely
> >>>> others.
> >>> Thanks for the explanation and patience :)
> >>>
> >>>>
> >>>>> If this is the case, then it sounds like a DRM specific issue to me.
> >>>> I mean, I guess. One could also argue that it's because IIO doesn't
> >>>> properly deal with hotplugging.
> >>> I must say I haven't been testing the IIO registration API. I've only tested
> >>> the helper API which is not backed up by any "IIO device". (This is fine for
> >>> the helper because it must by design be cleaned-up only after the
> >>> IIO-deregistration).
> >>>
> >>> After your explanation here, I am not convinced IIO wouldn't see the same
> >>> issue if I was testing the devm_iio_device_alloc() & co.
> >> It depends really. The issue DRM is trying to solve is that, when a
> >> device is gone, some application might still have an open FD and could
> >> still poke into the kernel, while all the resources would have been
> >> free'd if it was using devm.
> >>
> >> So everything is kept around until the last fd is closed, so you still
> >> have a reference to the device (even though it's been removed from its
> >> bus) until that time.
> >>
> >> It could be possible that IIO just doesn't handle that case at all. I
> >> guess most of the devices aren't hotpluggable, and there's not much to
> >> interact with from a userspace PoV iirc, so it might be why.
> > Lars-Peter Clausen (IIRC) fixed up the IIO handling of the similar cases a
> > long time ago now. It's simpler that for some other subsystems as we don't
> > have as many interdependencies as occur in DRM etc.
> >
> > I 'think' we are fine in general with the IIO approach to this (I think we
> > did have one report of a theoretical race condition in the remove path that
> > was never fully addressed).
> >
> > For IIO we also have fds that can be open but all accesses to them are proxied
> > through the IIO core and one of the things iio_device_unregister() or the devm
> > equivalent does is to set indio_dev->info = NULL (+ wake up anyone waiting on
> > data etc). Alongside removing the callbacks, that is also used as a flag
> > to indicate the device has gone.
> >
> > Note that we keep a reference to the struct indio_dev->dev (rather that the
> > underlying device) so that is not freed until the last fd is closed.
> > Thus, although devm unwinding has occurred that doesn't mean all the data
> > that was allocated with devm_xx calls is cleared up immediately.
>
> IIO is fully hot-plug and hot-unplug capable. And it will have the same
> issue. When using managed device registration that establishes a parent
> child relationship between the devices and in combination with a device
> where the managed unwinding does not happen on unbind, but rather on in
> the release callback you create a cyclic reference dependency. The child
> device holds a reference to the parent, but the reference is only
> released in the parents release callback. And since that release
> callback is not called until the last reference is dropped you end up
> with a resource leak.
>
> There are even some other places where IIO drivers run into this. E.g.
> any driver that does `devm_iio_trigger_register(&indio_dev->dev, ...)`
A driver should should not do that.
Should be devm_iio_trigger_registers(parent device, ...)
There is a corner where you need to detach the trigger from userspace
before release which is odd if it was attached by default. There are some
left over corners there that I think still cause trouble.
> creates a resource leak on the trigger and the IIO device. The indio_dev
> is not a bus device, hence no unbind and the trigger holds a reference
> so the release callback will never be called either.
>