Re: [PATCH 3/5] perf: Add pmu get/put

From: Peter Zijlstra
Date: Mon Oct 14 2024 - 15:25:38 EST


On Mon, Oct 14, 2024 at 01:20:34PM -0500, Lucas De Marchi wrote:
> On Mon, Oct 14, 2024 at 07:32:46PM +0200, Peter Zijlstra wrote:

> > I'm confused.. probably because I still don't have any clue about
> > drivers and the above isn't really telling me much either.
> >
> > I don't see how you get rid of the try_module_get() we do per event;
> > without that you can't unload the module.
>
> I don't get rid of the try_module_get(). They serve diffeerent purposes.
> Having a reference to the module prevents the _module_ going away (and
> hence the function pointers we call into from perf). It doesn't prevent
> the module unbinding from the HW. A module may have N instances if it's
> bound to N devices.
>
> This can be done today to unbind the HW (integrated graphics) from the
> i915 module:
>
> # echo 0000:00:02.0 > /sys/bus/pci/drivers/i915/unbind
>
> The ref taken by these new get()/put() are related to preventing the
> data going away - the driver can use that to take a ref on something
> that will survive the unbind.

OK, for some reason I thought to remember that you wanted to be able to
unload the module too.

> > And I don't see how you think it is safe to free a pmu while there are
> > still events around.
>
> so, we don't actually free it - the pmu is unregistered but the
> `struct pmu` and (possibly) its container are still around after unregister.
> When the get/put are used, the driver can keep the data around, which is
> then free'd when the last reference is put.

Aaaaah, okay. So the implementation knows to nop out all device
interaction when it gets unbound, but the events and pmu data stick
around until they're naturally killed off?

Ah, reading the below patches that is indeed what i915 does. pmu->closed
makes this so.

The dummy thing you posted in this thread, does perf_event_disable() on
all previously created events, and this is not sound. Userspace can do
PERF_EVENT_IOC_ENABLE on them and then things will go side-ways fast.
And I was afraid i915 was doing this same.

> - Subject: [PATCH 3/8] drm/i915/pmu: Fix crash due to use-after-free

So reading that Changelog, you would like a replacement for pmu->closed
as well.

I suppose, one way to go about doing this is to have
perf_pmu_unregister() replace a bunch of methods. Notably you have
pmu->closed in:

- event_init()
- read()
- start()
- stop()
- add()

Having perf_pmu_unregister() overwrite those function pointers with
something akin to your pmu->closed would go a long way, right? It would
require using READ_ONCE() for calling the methods, which would make
things a little ugly :/

But I also think we want to force all the events into STATE_ERROR, and
I'm not immediately sure how best to go about doing that. Adding better
return value handling to ->add() is trivial enough, and perhaps calling
perf_pmu_resched() is sufficient to cycle everything.

Let me ponder that a little bit.