Re: [PATCH 3/5] perf: Add pmu get/put

From: Lucas De Marchi
Date: Mon Oct 14 2024 - 14:20:53 EST


On Mon, Oct 14, 2024 at 07:32:46PM +0200, Peter Zijlstra wrote:
On Tue, Oct 08, 2024 at 01:34:59PM -0500, Lucas De Marchi wrote:
If a pmu is unregistered while there's an active event, perf will still
access the pmu via event->pmu, even after the event is destroyed. This
makes it difficult for drivers like i915 that can be unbound from the
HW.

BUG: KASAN: use-after-free in exclusive_event_destroy+0xd8/0xf0
Read of size 4 at addr ffff88816e2bb63c by task perf/7748

i915 tries to cope with it by installing a event->destroy, but that is
not sufficient: if pmu is released by the driver, it will still crash
since event->pmu is still used.

Moreover, even with that use-after-free fixed by adjusting the order in
_free_event() or delaying the free by the driver, kernel still oops when
closing the event fd related to a unregistered pmu: the percpu variables
allocated on perf_pmu_register() would already be released. One such
crash is:

BUG: KASAN: user-memory-access in _raw_spin_lock_irqsave+0x88/0x100
Write of size 4 at addr 00000000ffffffff by task perf/727

CPU: 0 UID: 0 PID: 727 Comm: perf Not tainted 6.12.0-rc1-DEMARCHI-dxnf+ #9
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS unknown 2/2/2022
Call Trace:
<TASK>
dump_stack_lvl+0x5f/0x90
print_report+0x4d3/0x50a
? __pfx__raw_spin_lock_irqsave+0x10/0x10
? kasan_addr_to_slab+0xd/0xb0
kasan_report+0xe2/0x170
? _raw_spin_lock_irqsave+0x88/0x100
? _raw_spin_lock_irqsave+0x88/0x100
kasan_check_range+0x125/0x230
__kasan_check_write+0x14/0x30
_raw_spin_lock_irqsave+0x88/0x100
? __pfx__raw_spin_lock_irqsave+0x10/0x10
_atomic_dec_and_raw_lock_irqsave+0x89/0x110
? __kasan_check_write+0x14/0x30
put_pmu_ctx+0x98/0x330

The fix here is to provide a set of get/put hooks that drivers can
implement to piggy back the perf's pmu lifecycle to the driver's
instance lifecycle. With this, perf_pmu_unregister() can be called by
the driver, which is then responsible for freeing the resources.

I'm confused.. probably because I still don't have any clue about
drivers and the above isn't really telling me much either.

I don't see how you get rid of the try_module_get() we do per event;
without that you can't unload the module.

I don't get rid of the try_module_get(). They serve diffeerent purposes.
Having a reference to the module prevents the _module_ going away (and
hence the function pointers we call into from perf). It doesn't prevent
the module unbinding from the HW. A module may have N instances if it's
bound to N devices.

This can be done today to unbind the HW (integrated graphics) from the
i915 module:

# echo 0000:00:02.0 > /sys/bus/pci/drivers/i915/unbind

The ref taken by these new get()/put() are related to preventing the
data going away - the driver can use that to take a ref on something
that will survive the unbind.


And I don't see how you think it is safe to free a pmu while there are
still events around.

so, we don't actually free it - the pmu is unregistered but the
`struct pmu` and (possibly) its container are still around after unregister.
When the get/put are used, the driver can keep the data around, which is
then free'd when the last reference is put.


Nor do I really see what these new get/put methods do. I see you call
->put() where we do module_put(), and ->get() near try_module_get(), but
how is that helping?

Maybe the specific patches for i915 can help? Patch series:
https://lore.kernel.org/intel-gfx/20241011225430.1219345-1-lucas.demarchi@xxxxxxxxx/

Important patches here are patches 2 and 3:

- Subject: [PATCH 2/8] drm/i915/pmu: Let resource survive unbind

Allow the final kfree() to happen at a different time, not
necessarily together with the call to perf_pmu_unregister().
Here it uses drmm_add_action() to easily tie on the last drm ref going
away.

- Subject: [PATCH 3/8] drm/i915/pmu: Fix crash due to use-after-free

This implements the get()/put() so we get/put a reference to the drm
dev.

These 2 patches for i915 are the equivalent of patch 4 in this series
for the dummy_pmu.

Lucas De Marchi