Re: [tip:perf/core] [perf] da916e96e2: BUG:KASAN:null-ptr-deref_in_put_event

From: James Clark
Date: Tue Apr 15 2025 - 11:53:10 EST




On 15/04/2025 2:14 pm, Peter Zijlstra wrote:
On Tue, Apr 15, 2025 at 12:08:40PM +0200, Peter Zijlstra wrote:
On Tue, Apr 15, 2025 at 10:14:05AM +0100, James Clark wrote:
On 15/04/2025 5:46 am, Oliver Sang wrote:

yes, below patch fixes the issues we observed for da916e96e2. thanks

Tested-by: kernel test robot <oliver.sang@xxxxxxxxx>


Also fixes the same issues we were seeing:

Tested-by: James Clark <james.clark@xxxxxxxxxx>

Excellent, thank you both! Now I gotta go write me a Changelog :-)

Hmm, so while writing Changelog, I noticed something else was off. The
case where event->parent was set to EVENT_TOMBSTONE now didn't have a
put_event(parent) anymore. So that needs to be put back in as well.

Frederic, afaict this should still be okay, since if we're detached,
then nothing will try and access event->parent in the free path.

Also, nothing in perf_pending_task() will try and access either
event->parent or event->pmu.

---
Subject: perf: Fix event->parent life-time issue
From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Date: Tue Apr 15 12:12:52 CEST 2025

Due to an oversight in merging da916e96e2de ("perf: Make
perf_pmu_unregister() useable") on top of 56799bc03565 ("perf: Fix
hang while freeing sigtrap event"), it is now possible to hit
put_event(EVENT_TOMBSTONE), which makes the computer sad.

This also means that for the event->parent == EVENT_TOMBSTONE, the
put_event() matching inherit_event() has gone missing.

Previously this was done in perf_event_release_kernel() after calling
perf_remove_from_context(), but with it delegated to put_event(), this
case is now entirely missed, leading to leaks.

Fixes: da916e96e2de ("perf: Make perf_pmu_unregister() useable")
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
---
kernel/events/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2343,6 +2343,7 @@ static void perf_child_detach(struct per
* not being a child event. See for example unaccount_event().
*/
event->parent = EVENT_TOMBSTONE;
+ put_event(parent_event);
}
static bool is_orphaned_event(struct perf_event *event)
@@ -5688,7 +5689,7 @@ static void put_event(struct perf_event
_free_event(event);
/* Matches the refcount bump in inherit_event() */
- if (parent)
+ if (parent && parent != EVENT_TOMBSTONE)
put_event(parent);
}

Hi Peter,

Unrelated to the pointer deref issue, I'm also seeing perf stat not working due to this commit. And that's both with and without this fixup:

-> perf stat -- true

Performance counter stats for 'true':

<not counted> msec task-clock
<not counted> context-switches
<not counted> cpu-migrations
<not counted> page-faults
<not counted> armv8_cortex_a53/instructions/
<not counted> armv8_cortex_a57/instructions/
<not counted> armv8_cortex_a53/cycles/
<not counted> armv8_cortex_a57/cycles/
<not counted> armv8_cortex_a53/branches/
<not counted> armv8_cortex_a53/branch-misses/
<not counted> armv8_cortex_a57/branch-misses/

0.074139992 seconds time elapsed

0.000000000 seconds user
0.054797000 seconds sys

Didn't look into it more other than bisecting it to this commit, but I can dig more unless the issue is obvious. This is on Arm big.LITTLE, although I didn't test it elsewhere so I'm not sure if that's relevant or not.

Thanks
James