Re: [PATCH 3/5] perf: Add pmu get/put

From: Lucas De Marchi
Date: Mon Oct 14 2024 - 16:13:06 EST


On Mon, Oct 14, 2024 at 09:25:19PM +0200, Peter Zijlstra wrote:
On Mon, Oct 14, 2024 at 01:20:34PM -0500, Lucas De Marchi wrote:
On Mon, Oct 14, 2024 at 07:32:46PM +0200, Peter Zijlstra wrote:

> I'm confused.. probably because I still don't have any clue about
> drivers and the above isn't really telling me much either.
>
> I don't see how you get rid of the try_module_get() we do per event;
> without that you can't unload the module.

I don't get rid of the try_module_get(). They serve diffeerent purposes.
Having a reference to the module prevents the _module_ going away (and
hence the function pointers we call into from perf). It doesn't prevent
the module unbinding from the HW. A module may have N instances if it's
bound to N devices.

This can be done today to unbind the HW (integrated graphics) from the
i915 module:

# echo 0000:00:02.0 > /sys/bus/pci/drivers/i915/unbind

The ref taken by these new get()/put() are related to preventing the
data going away - the driver can use that to take a ref on something
that will survive the unbind.

OK, for some reason I thought to remember that you wanted to be able to
unload the module too.

humn... maybe because in the first version we were talking about
shortcutting all the function calls. If made by the driver, it'd allow
to remove the ugly `if (pmu->closed)`, and if made by perf itself, it
would allow to drop the module ref since we would guarantee we are not
calling into the module anymore.

But I think that's orthogonal to what we really care about: once the HW
vanishes underneath us, stop accessing it and unregister the PMU in a
way that if the HW shows up again we can still register it and nothing
explodes.


> And I don't see how you think it is safe to free a pmu while there are
> still events around.

so, we don't actually free it - the pmu is unregistered but the
`struct pmu` and (possibly) its container are still around after unregister.
When the get/put are used, the driver can keep the data around, which is
then free'd when the last reference is put.

Aaaaah, okay. So the implementation knows to nop out all device
interaction when it gets unbound, but the events and pmu data stick
around until they're naturally killed off?

Ah, reading the below patches that is indeed what i915 does. pmu->closed
makes this so.

yes. Without these patches it doesn't work well though, particuarly
because

a) even if we protected the methods with pmu->closed(), the data is
freed when we call perf_pmu_unregister(), making the whole pmu pointer
invalid
b) kernel/core/events.c still accesss the pmu after calling
event->destroy() - we can't really hook on that to destroy the pmu like
is done today.


The dummy thing you posted in this thread, does perf_event_disable() on

that's optional and I think we could live without. The main issue is
completely crashing the kernel if we do:

# perf stat -e i915/rc6-residency/ -I1000 &
# echo 0000:00:02.0 > /sys/bus/pci/drivers/i915/unbind

... which these patches are fixing regardless of calling
perf_event_disable().

all previously created events, and this is not sound. Userspace can do
PERF_EVENT_IOC_ENABLE on them and then things will go side-ways fast.
And I was afraid i915 was doing this same.

For the i915 series, that would be "[PATCH 8/8] drm/i915/pmu: Release
open events when unregistering". See release_active_events()

I was just wanting a smoke signal/hint for userspace that "something
went wrong" with this counter.


- Subject: [PATCH 3/8] drm/i915/pmu: Fix crash due to use-after-free

So reading that Changelog, you would like a replacement for pmu->closed
as well.

I suppose, one way to go about doing this is to have
perf_pmu_unregister() replace a bunch of methods. Notably you have
pmu->closed in:

- event_init()
- read()
- start()
- stop()
- add()

Having perf_pmu_unregister() overwrite those function pointers with
something akin to your pmu->closed would go a long way, right? It would

it would help if we want to unload the module, to make it work for
other drivers without having to add similar flag and to make those
drivers less ugly with those checks. However grepping the
kernel for calls to perf_pmu_register(), it seems there are only 3
candidates, all in drm: i915, amdgpu and xe (the xe is a pending patch
series waiting on the resolution of this issue). There is
drivers/powercap/intel_rapl_common.c with its `bool registered` flag,
but that is basically doing multiple register/unregister to replace the pmu
rather than working with HW that can be removed.

require using READ_ONCE() for calling the methods, which would make
things a little ugly :/

But I also think we want to force all the events into STATE_ERROR, and

yeah... When I was looking for the smoke signal I mentioned, I wanted
something to move the event into that state, but couldn't find one. The
best I found was to disable it.

I'm not immediately sure how best to go about doing that. Adding better
return value handling to ->add() is trivial enough, and perhaps calling
perf_pmu_resched() is sufficient to cycle everything.

Let me ponder that a little bit.

thanks for the help

Lucas De Marchi