Re: [PATCH 2/2] genirq: proc: fix a procfs entry leak

From: Bartosz Golaszewski
Date: Wed Sep 06 2023 - 10:55:24 EST


On Wed, Aug 30, 2023 at 12:29 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> On Tue, Aug 29 2023 at 14:24, Bartosz Golaszewski wrote:
> > On Tue, Aug 29, 2023 at 11:11 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> >> > On Mon, Aug 28, 2023 at 11:44 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> >> So when the USB device disconnects, then something needs to tell the
> >> i2c-xx-driver that the I2C interface is not longer available, right?
> >>
> >> IOW, the unbind operation needs the following notification and teardown
> >> order:
> >>
> >> 1) USB core notifies hid-cp2112
> >>
> >> 2) hid-cp2112 notifies i2c-xx-driver
> >>
> >> 3) i2c-xx-driver mops up and invokes free_irq()
> >>
> >> 4) hid-cp2112 removes the interrupt domain
> >>
> >> But it seems that you end up with a situation where the notification of
> >> the i2c-xx-driver is either not happening or the driver in question is
> >> simply failing to mop up and free the requested interrupt.
> >>
> >
> > Yes, there's no notification of any kind.
>
> I'm not buying that.

Gah, sorry, it took me a while to get back to this thread...

>
> usb disconnect
> ...
> cp2112_remove()
> i2c_del_adapter()
> i2c_unregister_device(client)
> ...
> device_unregister()
> device_del()
> bus_notify() // Mechanism #1
> i2c_device_remove()
> if (dev->remove)
> dev->remove()
> ...
> device_unbind_cleanup()
> devres_release_all() // Mechanism #2
>
> gpiochip_remove()
>
> There are very well notifications to the drivers about unplug of a
> device. Otherwise this would end up in a complete disaster and a lot
> more stale data and state than just a procfs file or a requested
> interrupt.

I'm not sure how either of the two helps here. #2 just releases
managed resources owned by cp2112. It can remove the domain with an
appropriate devm action but it won't do anything for the users of
interrupts. #1 is a bus notification emitted when the I2C adapter
exposed by cp2112 has been deleted. This one in particular doesn't
help us, the domain is long gone by now but if I get what you mean
correctly, you'd want the driver to call request_irq() and then set up
a notifier block for the BUS_NOTIFY_UNBIND_DRIVER notification of the
provider of that interrupt? Doesn't that break like half a dozen of
abstraction layers? Because now the device driver which is the GPIO
consumer needs to know where it gets its interrupts from?

>
> So the mechanisms are there, no?
>

Technically you could duct tape something up. But this is far from
being generic and it would be a per-driver solution at best, not even
per-subsystem. Because this is just interrupts. What about other
providers? Setup a notification for every single one? The resources a
device needs are normally acquired in probe() and released in
remove(). Possibly there are per-systems setup/teardown, open/close
etc. callbacks but it's not much different. What would you do in this
notification callback? Release the resource and keep going? In what
state would that device be now?

You would think that plug-and-play works well in the kernel and it's
true for certain parts but it really isn't the case for subsystems
that were not considered as very plug-and-play until people started
putting them on a stick. Some devices that are not typically
hot-pluggable - like serial - have been used with USB for so long that
they do handle unplugging very well. But as soon as you put i2c on
USB, you'll see what a mess it is. If you have an I2C device on a USB
I2C expander and it's being used, when you pull the plug, you'll see
that the kernel thread removing the device will block on a call to
wait_for_completion() until all users release their i2c adapter
references. They (the users) are not however notified in any generic
way about the provider of their resources being gone.

And even if they were, what would they do? "Unbind themselves" so that
their remove paths be triggered where they would free resources? That
may be a solution I suppose - a device whose non-optional resource
provider is gone, should go down as well I suppose. This would be a
huge development however and a more realistic approach is to make
provider APIs resistant to the backing device being removed.

> If this is just about the problem that some device driver writers fail
> to implement them correctly, then yes it makes sense to have a last
> resort fallback which cleans them up and emits a big fat warning.
>
> Making this a programming model would be beyond wrong.
>
> > It's a common problem unfortunately across different subsystems. We
> > have hot-unpluggable consumers using resources that don't support it
> > (like interrupts in this example).
>
> All hotpluggable consumers have at least one mechanism to mop up the
> resources they allocated. There are a lot of resources in the kernel
> which do not clean themself up magically.
>

Yeah, hotpluggable consumers are fine. The problem here is
hotpluggable *providers* with consumers who don't know that.

> So what's so special about interrupts? They are not any different from a
> pointer which is registered at some entity and the device driver writer
> forgets to unregister it, but the underlying resource is freed. That's
> even worse than the leaked interrupt and cannot be magically undone at
> all.

But that's precisely my point. It's not only interrupts. There *IS* a
wider problem in the kernel with resources that are provided by
devices that can disappear at any moment. My argument is that the
provider API should handle that and not rely on the consumer to
release those resources in time. Serial or GPIO handle that by having
a reference counted outer layer and an inner implementation that can
go from under it. The outer layer then simply returns an appropriate
error to the consumer. During any of the provider calls, the
lower-level object is locked and protected from being destroyed.

>
> Whatever you try, you can't turn driver programming into a task which
> can be accomplished w/o brains.
>

I merely want to reduce the number of obvious traps. :)

> > For interrupts it would mean that when the consumer calls
> > request_irq(), the number it gets is a weak reference to the irq_desc.
>
> Interrupt numbers are weak references by definition. request_irq() does
> not return an interrupt number, it returns success or fail. The
> interrupt number is handed to request_irq(), no?
>
> The entities which hand out the interrupt number are a complete
> different story. But that number is from their perspective a weak
> reference too.
>
> > For any management operation we lock irq_desc.
>
> That's required anyway, but irq_desc::lock is not a sufficient
> protection against a teardown race.
>
> > If the domain is destroyed, irq_descs get destroyed with it
>
> Interrupts consist of more than just an interrupt descriptor. If you
> care to look at the internals, then the descriptor is the last entity
> which goes away simply because all other related resources hang off the
> interrupt descriptor.
>
> So they obviously need to be mopped up first and trying to mop up
> requested interrupts at the point where the interrupt descriptor is
> freed is way too late.
>
> In fact they need to mopped up first, which is obvious from the setup
> ordering as everything underneath must be in place already before
> request_irq() can succeed. The only sensible ordering for teardown is
> obviously the reverse of setup, but that's not specific to interrupts at
> all.
>
> > (after all users leave the critical section).
>
> Which critical section? Interrupts have more than just the
> irq_desc::lock critical section which needs to be left.
>
> > Next call to any of the functions looks up the irq number and sees
> > it's gone. It fails or silently returns depending on the function
> > (e.g. irq_free() would have to ignore the missing lookup).
>
> That's what happens today already.
>
> The missing bit is the magic function which mops up when the driver
> writer blew it up. But that's far from a 'put a trivial free_irq() call
> somewhere'.
>
> First of all we have too many interrupt mechanisms ranging from the
> linux 0.1 model with static interrupt spaces to hierarchical interrupt
> domains.
>
> - Almost everything which is interrupt domain based (hierarchical or
> not) can probably be "addressed" by some definition of "addressed"
> because there are only a few places in the core code which are
> involved in releasing individual or domain wide resources.
>
> But see below.
>
> - The incarnations which do not use interrupt domains are way harder
> because the teardown of the interrupt resources is not happening in
> the core code. It's happening at the driver side and the core is
> only involved to release the interrupt descriptor. But that's too
> late.
>
> The good news about those is that probably the vast majority of the
> instances is built in, mostly legacy and not pluggable.
>
> So lets assume that anything which is not interrupt domain based is not
> relevant, which reduces the problem space significantly. But the
> remaining space is still non-trivial.
>
> 1) Life-time
>
> Interrupt descriptors are RCU freed but that's mostly for external
> usage like /proc/interrupts
>
> Still most of the core code relies on the proper setup/teardown
> order, so there would be quite some work to do vs. validating that
> an interrupt descriptor is consistent at all hierarchy levels.
>
> That's going to be interesting vs. interfaces which can be invoked
> from pretty much any context (except NMI).
>
> irq_desc::lock is not the panacea here because it obviously can
> only be held for real short truly atomic sections. But quite some
> of the setup/teardown at all levels requires sleepable context.
>
> 2) Concurrency
>
> Protection against concurrent interrupt handler execution is
> covered in free_irq() as it fully synchronizes.
>
> That's the least of my worries.
>
> Whatever the invalidation mechanism might be, pretty much every
> usage site of irq_to_desc() and any usage site of interrupt
> descriptors which are stored outside of the maple_tree for
> performance reasons need to be audited.
>
> The validation has to take into account whether that's an requested
> descriptor or an unused one.
>
> So no, neither removing some procfs entry at the wrong point nor
> slapping some variant of free_irq() into random places is going to cut
> it.
>
> Your idea of tracking request_irq()/free_irq() at some subsystem level
> is not going to work either simply because it requires that such muck is
> sprinkled all over the place.
>

I was thinking more about tracking it at the irq domain level so that
when a domain is destroyed with interrupts requested, these interrupts
are freed. I admit I still don't have enough in-depth knowledge about
linux interrupts to understand why it can't work, I need to spend
more time on this. I'll be back.

> I completely agree that the interrupt core code does not have enough
> places which emit a prominent warning when the rules are violated.
>
> So sure, in some way or the other a "fix" by some definition of "fix"
> could be implemented, but I'm really not convinced that's the right way
> to waste^Wspend our time with.
>
> Especially not because interrupt handling is a hot-path and any
> life-time/conncurrency validation mechanism will introduce overhead no
> matter what.
>

Thanks for the detailed explanation.

Bartosz