Re: [PATCH 2/3] gpiolib: use kref in gpio_desc

From: Bartosz Golaszewski
Date: Thu May 14 2020 - 09:42:19 EST


pon., 30 mar 2020 o 16:36 Bartosz Golaszewski <brgl@xxxxxxxx> napisaÅ(a):
>
> czw., 26 mar 2020 o 21:50 Linus Walleij <linus.walleij@xxxxxxxxxx> napisaÅ(a):
> >
> > On Fri, Mar 13, 2020 at 3:47 PM Bartosz Golaszewski <brgl@xxxxxxxx> wrote:
> > > czw., 12 mar 2020 o 11:35 Linus Walleij <linus.walleij@xxxxxxxxxx> napisaÅ(a):
> >
> > > In this case I was thinking about a situation where we pass a
> > > requested descriptor to some other framework (nvmem in this case)
> > > which internally doesn't know anything about who manages this resource
> > > externally. Now we can of course simply not do anything about it and
> > > expect the user (who passed us the descriptor) to handle the resources
> > > correctly. But what happens if the user releases the descriptor not on
> > > driver detach but somewhere else for whatever reason while nvmem
> > > doesn't know about it? It may try to use the descriptor which will now
> > > be invalid. Reference counting in this case would help IMHO.
> >
> > I'm so confused because I keep believing it is reference counted
> > elsewhere.
> >
> > struct gpio_desc *d always comes from the corresponding
> > struct gpio_device *descs array. This:
> >
> > struct gpio_device {
> > int id;
> > struct device dev;
> > (...)
> > struct gpio_desc *descs;
> > (...)
> >
> > This array is allocated in gpiochip_add_data_with_key() like this:
> >
> > gdev->descs = kcalloc(chip->ngpio, sizeof(gdev->descs[0]), GFP_KERNEL);
> >
> > Then it gets free:d in gpiodevice_release():
> >
> > static void gpiodevice_release(struct device *dev)
> > {
> > struct gpio_device *gdev = dev_get_drvdata(dev);
> > (...)
> > kfree(gdev->descs);
> > kfree(gdev);
> > }
> >
> > This is the .release function for the gdev->dev, the device inside
> > struct gpio_device,
> > i.e. the same device that contains the descs in the first place. So it
> > is just living
> > and dying with the struct gpio_device.
> >
> > struct gpio_device does *NOT* die in the devm_* destructor that gets called
> > when someone has e.g. added a gpiochip using devm_gpiochip_add_data().
> >
> > I think the above observation is crucial: the lifetime of struct gpio_chip and
> > struct gpio_device are decoupled. When the struct gpio_chip dies, that
> > just "numbs" all gpio descriptors but they stay around along with the
> > struct gpio_device that contain them until the last
> > user is gone.
> >
> > The struct gpio_device reference counted with the call to get_device(&gdev->dev)
> > in gpiod_request() which is on the path of gpiod_get_[index]().
> >
> > If a consumer gets a gpio_desc using any gpiod_get* API this gets
> > increased and it gets decreased at every gpiod_put() or by the
> > managed resources.
> >
> > So should you not rather exploit this fact and just add something
> > like:
> >
> > void gpiod_reference(struct gpio_desc *desc)
> > {
> > struct gpio_device *gdev;
> >
> > VALIDATE_DESC(desc);
> > gdev = desc->gdev;
> > get_device(&gdev->dev);
> > }
> >
> > void gpiod_unreference(struct gpio_desc *desc)
> > {
> > struct gpio_device *gdev;
> >
> > VALIDATE_DESC(desc);
> > gdev = desc->gdev;
> > put_device(&gdev->dev);
> > }
> >
> > This should make sure the desc and the backing gpio_device
> > stays around until all references are gone.
> >
> > NB: We also issue try_module_get() on the module that drives the
> > GPIO, which will make it impossible to unload that module while it
> > has active GPIOs. I think maybe the whole logic inside gpiod_request()
> > is needed to properly add an extra reference to a gpiod else someone
> > can (theoretically) pull out the module from below.
> >
>
> Thanks a lot for the detailed explanation. I'll make some time
> (hopefully soon) to actually test this path and let you know if it
> works as expected.
>
> Best regards,
> Bartosz Golaszewski

Hi Linus,

So this "numbing down" of the chip works - in that I don't see any
splat in the above use-case but right now if nvmem takes an existing
GPIO descriptor over nvmem_config, then it will call gpiod_put() on it
and we'll do the same in the provider driver leading to the following
warning:

[ 109.191755] ------------[ cut here ]------------
[ 109.191787] WARNING: CPU: 0 PID: 207 at drivers/gpio/gpiolib.c:3097
release_nodes+0x1ac/0x1f8
[ 109.191794] Modules linked in: at24
[ 109.191975] CPU: 0 PID: 207 Comm: rmmod Not tainted
5.7.0-rc5-00001-g8c4cd0ae52ce-dirty #12
[ 109.191982] Hardware name: Generic AM33XX (Flattened Device Tree)
[ 109.192028] [<c01119ec>] (unwind_backtrace) from [<c010b9c0>]
(show_stack+0x10/0x14)
[ 109.192050] [<c010b9c0>] (show_stack) from [<c05456b4>]
(dump_stack+0xc0/0xe0)
[ 109.192076] [<c05456b4>] (dump_stack) from [<c0138b30>] (__warn+0xc0/0xf8)
[ 109.192095] [<c0138b30>] (__warn) from [<c0138ec4>]
(warn_slowpath_fmt+0x58/0xb8)
[ 109.192112] [<c0138ec4>] (warn_slowpath_fmt) from [<c0635938>]
(release_nodes+0x1ac/0x1f8)
[ 109.192136] [<c0635938>] (release_nodes) from [<c0631d7c>]
(device_release_driver_internal+0xf8/0x1b8)
[ 109.192154] [<c0631d7c>] (device_release_driver_internal) from
[<c0631eac>] (driver_detach+0x58/0xa8)
[ 109.192172] [<c0631eac>] (driver_detach) from [<c0630ae0>]
(bus_remove_driver+0x4c/0xa4)
[ 109.192191] [<c0630ae0>] (bus_remove_driver) from [<c01d80e0>]
(sys_delete_module+0x1bc/0x240)
[ 109.192211] [<c01d80e0>] (sys_delete_module) from [<c0100260>]
(__sys_trace_return+0x0/0x20)

I'm not sure how to go about fixing it though. We could check in nvmem
if the descriptor was retrieved locally or passed from the nvmem
provider and the either put it or not respectively but this isn't very
clean.

Bart