Re: [PATCH] spi: ensure timely release of driver-allocated resources

From: Dmitry Torokhov
Date: Wed Mar 24 2021 - 18:28:09 EST


On Wed, Mar 24, 2021 at 09:32:26PM +0000, Mark Brown wrote:
> On Tue, Mar 23, 2021 at 12:04:34PM -0700, Dmitry Torokhov wrote:
> > On Tue, Mar 23, 2021 at 05:36:06PM +0000, Mark Brown wrote:
>
> > No it is ordering issue. I do not have a proven real-life example for
> > SPI, but we do have one for I2C:
>
> > https://lore.kernel.org/linux-devicetree/20210305041236.3489-7-jeff@xxxxxxxxxxx/
>
> TBH that looks like a fairly standard case where you probably don't want
> to be using devm for the interrupts in the first place. Leaving the
> interrupts live after the bus thinks it freed the device doesn't seem
> like the best idea, I'm not sure I'd expect that to work reliably when
> the device tries to call into the bus code to interact with the device
> that the bus thought was already freed anyway.

That is not an argument really. By that token we should not be using
devm for anything but memory, and that is only until we implement some
kind of memleak checker that will ensure that driver-allocated memory is
released after driver's remove() method exits.

If we have devm API we need to make sure it works.

You also misread that the issue is limited to interrupts, when i fact
in this particular driver it is the input device that is being
unregistered too late and fails to timely quiesce the device. Resulting
interrupt storm is merely a side effect of this.

>
> If we want this to work reliably it really feels like we should have two
> remove callbacks in the driver core doing this rather than open coding
> in every single bus which is what we'd need to do - this is going to
> affect any bus that does anything other than just call the device's
> remove() callback. PCI looks like it might have issues too for example,
> and platform does as well and those were simply the first two buses I
> looked at. Possibly we want a driver core callback which is scheduled
> via devm (remove_devm(), cleanup() or something). We'd still need to
> move things about in all the buses but it seems preferable to do it that
> way rather than open coding opening a group and the comments about
> what's going on and the ordering requirements everywhere, it's a little
> less error prone going forward.

>From the driver's perspective they expect devm-allocated resources to
happen immediately after ->remove() method is run. I do not believe any
driver is interested in additional callback, and you'd need to modify
a boatload of drivers to fix this issue.

I agree with you that manual group handling might be a bit confusing
and sprinkling the same comment everywhere is not too nice, so how about
we provide:

void *devm_mark_driver_resources(struct device *dev)

and

void devm_release_driver_resources()

and keep the commentary there? The question is where to keep
driver_res_id field - in generic device structure or put it into bus'
device structure as some buses and classes do not need it and we'd be
sawing 4-8 bytes per device structure this way.

Another way is to force buses to use devm for their resource management
(I.e. in case of SPI wrap dev_pm_domain_detach() in
devm_add_action_or_release()). It works for buses that have small number
of resource allocated, but gets more unwieldy and more expensive the
more resources are managed at bus level, that is why I opted for opening
a group.

>
> > Note how dev_pm_domain_detach() jumped ahead of everything, and
> > strictly speaking past this point we can no longer guarantee that we can
> > access the chip and disable it.
>
> Frankly it looks like the PM domain stuff shouldn't be in the probe()
> and remove() paths at all and this has been bogusly copies from other
> buses, it should be in the device registration paths. The device is in
> the domain no matter what's going on with binding it. Given how generic
> code is I'm not even sure why it's in the buses.

Here I will agree with you, bit I think it comes from power domain
"duality". In OF power domain represents grouping of devices, and is
static as devices do not move around, whereas in ACPI domain means
control, and we are putting a device under control of ACPI PM when we
bind it to a driver. As part of that control we bring the device into
_D0, etc.

Yay for mixing concepts, but this is not really material to the question
of how to orderly release resources.

Thanks.

--
Dmitry