Re: [PATCH] gpio: cdev: fix a crash on line-request release

From: Kent Gibson
Date: Wed May 24 2023 - 00:36:41 EST


On Wed, May 24, 2023 at 10:09:42AM +0800, Kent Gibson wrote:
> On Wed, May 24, 2023 at 07:58:36AM +0800, Kent Gibson wrote:
> > On Tue, May 23, 2023 at 05:51:01PM +0200, Bartosz Golaszewski wrote:
> > > When a GPIO device is forcefully unregistered, we are left with an
> > > inactive object. If user-space kept an open file descriptor to a line
> > > request associated with such a structure, upon closing it, we'll see the
> > > kernel crash due to freeing unexistent GPIO descriptors.
> > >
> >
> > > @@ -1565,17 +1571,21 @@ static ssize_t linereq_read(struct file *file, char __user *buf,
> > >
> > > static void linereq_free(struct linereq *lr)
> > > {
> > > + struct gpio_device *gdev = lr->gdev;
> > > unsigned int i;
> > >
> > > for (i = 0; i < lr->num_lines; i++) {
> > > if (lr->lines[i].desc) {
> > > edge_detector_stop(&lr->lines[i]);
> > > - gpiod_free(lr->lines[i].desc);
> > > + down_write(&gdev->sem);
> > > + if (gdev->chip)
> > > + gpiod_free(lr->lines[i].desc);
> > > + up_write(&gdev->sem);
>
> Ummm, taking another look at the oops I sent you, the crash actually
> occurs in edge_detector_stop():
>
> May 23 11:47:06 firefly kernel: [ 4216.877056] Call Trace:
> May 23 11:47:06 firefly kernel: [ 4216.877512] <TASK>
> May 23 11:47:06 firefly kernel: [ 4216.877924] irq_domain_deactivate_irq+0x19/0x30
> May 23 11:47:06 firefly kernel: [ 4216.878543] free_irq+0x257/0x360
> May 23 11:47:06 firefly kernel: [ 4216.879056] linereq_free+0x9b/0xe0
> May 23 11:47:06 firefly kernel: [ 4216.879608] linereq_release+0xc/0x20
> May 23 11:47:06 firefly kernel: [ 4216.880230] __fput+0x87/0x240
> May 23 11:47:06 firefly kernel: [ 4216.880744] task_work_run+0x54/0x80
>
> That free_irq() call is in edge_detector_stop() (which apparently is inlined),
> not in gpiod_free().
>
> So pretty sure this patch doesn't even solve my problem, but I will test
> it to confirm.
>

Yeah, doesn't fix my problem still crashes.

If the line request doesn't have edge detection enabled (so no
irq) then I don't get a crash.
i.e. use gpioset to request the line, rather than gpiomon.

(To provide background for anyone else trying to follow along, the
scenario is:
1. create a gpio-sim
2. request a line
3. destroy the gpio-sim
4. release the line.

3 triggers this error:

May 24 11:11:12 firefly kernel: [ 200.027280] gpio_stub_drv gpiochip0: REMOVING GPIOCHIP WITH GPIOS STILL REQUESTED

and 4 triggers a crash - if the requested line holds an irq.)

I would point out:
/**
* gpiochip_remove() - unregister a gpio_chip
* @gc: the chip to unregister
*
* A gpio_chip with any GPIOs still requested may not be removed.
*/
void gpiochip_remove(struct gpio_chip *gc)

which is where that dev_crit() is, so destroying the gpio-sim has already
invalidated that contract.

Anyway, it seems my problem is the forced removal of the gpiochip invalidates
the irq that the line request is holding.
Not sure how best to deal with that.

Moving the edge_detector_stop() inside the "if (gdev->chip)" check of
your patch does prevent crash.
But in that case edge_detector_stop() does other cleanup that is no longer
getting done.
Perhaps if the chip is gone then zero line->irq prior to calling
edge_detector_stop()?
Again, this is starting to feel like a hack for gpiolib not being good
at telling the client that it has to pull the rug.
Though according the the gpiochip_remove() docs, it WONT pull the rug,
so you get that.

Cheers,
Kent.