Re: [PATCH] spi: ensure timely release of driver-allocated resources

From: Dmitry Torokhov
Date: Wed Mar 24 2021 - 20:18:59 EST


On Wed, Mar 24, 2021 at 11:39:35PM +0000, Mark Brown wrote:
> On Wed, Mar 24, 2021 at 03:27:23PM -0700, Dmitry Torokhov wrote:
> > On Wed, Mar 24, 2021 at 09:32:26PM +0000, Mark Brown wrote:
>
> > > TBH that looks like a fairly standard case where you probably don't want
> > > to be using devm for the interrupts in the first place. Leaving the
> > > interrupts live after the bus thinks it freed the device doesn't seem
> > > like the best idea, I'm not sure I'd expect that to work reliably when
> > > the device tries to call into the bus code to interact with the device
> > > that the bus thought was already freed anyway.
> >
> > That is not an argument really. By that token we should not be using
> > devm for anything but memory, and that is only until we implement some
> > kind of memleak checker that will ensure that driver-allocated memory is
> > released after driver's remove() method exits.
>
> I pretty much agree with that perspective TBH, I rarely see interrupt
> users that I conside safe. It's the things that actively do stuff that
> cause issues, interrupts and registration of userspace interfaces both
> being particularly likely to do so as work comes in asynchronously.
>
> > You also misread that the issue is limited to interrupts, when i fact
> > in this particular driver it is the input device that is being
> > unregistered too late and fails to timely quiesce the device. Resulting
> > interrupt storm is merely a side effect of this.
>
> My understanding was that the issue is that the driver is generating
> work because the interrupt is firing, if the interrupt had been
> unregistered we'd not be getting the interupts delivered (probably
> they'd have been disabled at the interrupt controller level) and not
> have the problem of trying to handle them on a partially unwound device.

Yes, but the root of the problem is that we rely that the device will be
stopped when input core "closes" it upon device unregistering, so even
if we did not have this particular issue with interrupt storm we would
still be at risk of accessing half-dead device. What I am trying to say
(and I believe we are actually agreeing with each other) is that not
only interrupts, but other devm-allocated objects are also can cause
problems here.

>
> > > looked at. Possibly we want a driver core callback which is scheduled
> > > via devm (remove_devm(), cleanup() or something). We'd still need to
> > > move things about in all the buses but it seems preferable to do it that
> > > way rather than open coding opening a group and the comments about
> > > what's going on and the ordering requirements everywhere, it's a little
> > > less error prone going forward.
>
> > From the driver's perspective they expect devm-allocated resources to
> > happen immediately after ->remove() method is run. I do not believe any
> > driver is interested in additional callback, and you'd need to modify
> > a boatload of drivers to fix this issue.
>
> Not a callback for the device drivers, for the buses. This is
> essentially about converting the unwinding the bus does to be sequenced
> for devm.

Ah, OK.

>
> > I agree with you that manual group handling might be a bit confusing
> > and sprinkling the same comment everywhere is not too nice, so how about
> > we provide:
>
> > void *devm_mark_driver_resources(struct device *dev)
>
> > and
>
> > void devm_release_driver_resources()
>
> > and keep the commentary there? The question is where to keep
> > driver_res_id field - in generic device structure or put it into bus'
> > device structure as some buses and classes do not need it and we'd be
> > sawing 4-8 bytes per device structure this way.
>
> I guess bus' device :/
>
> > Another way is to force buses to use devm for their resource management
> > (I.e. in case of SPI wrap dev_pm_domain_detach() in
> > devm_add_action_or_release()). It works for buses that have small number
> > of resource allocated, but gets more unwieldy and more expensive the
> > more resources are managed at bus level, that is why I opted for opening
> > a group.
>
> If the driver core were doing it and scheduling a single callback the
> bus provides then that callback could do as much as it likes...

OK, I can look into it. In the meantime wills something like below a bit
easier for you to stomach? If so I'll resubmit formally.

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index c19a09201358..a0db83bcb3d4 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -375,6 +375,11 @@ static int spi_uevent(struct device *dev, struct kobj_uevent_env *env)
return add_uevent_var(env, "MODALIAS=%s%s", SPI_MODULE_PREFIX, spi->modalias);
}

+static void spi_dev_pm_domain_detach(void *_dev)
+{
+ dev_pm_domain_detach(_dev, true);
+}
+
static int spi_probe(struct device *dev)
{
const struct spi_driver *sdrv = to_spi_driver(dev->driver);
@@ -421,11 +426,12 @@ static int spi_probe(struct device *dev)
if (ret)
return ret;

- if (sdrv->probe) {
+ ret = devm_add_action_or_reset(dev, spi_dev_pm_domain_detach, dev);
+ if (ret)
+ return ret;
+
+ if (sdrv->probe)
ret = sdrv->probe(spi);
- if (ret)
- dev_pm_domain_detach(dev, true);
- }

return ret;
}
@@ -444,8 +450,6 @@ static int spi_remove(struct device *dev)
ERR_PTR(ret));
}

- dev_pm_domain_detach(dev, true);
-
return 0;
}


--
Dmitry