Re: [PATCH v2] events/core: fix acoount failure for event's total_enable_time
From: Peter Zijlstra
Date: Fri Dec 20 2024 - 10:14:45 EST
On Fri, Dec 20, 2024 at 02:05:39PM +0000, Yeoreum Yun wrote:
> > > This account failure of total_enable_time for event could happen in below sequence.
> > >
> > > 1. two event opened with:
> > > - first event (e0) is opened with pmu0(p0) which could be added on CPU0.
> > > - second event (e1) is opened with pmu1(p1) which could be added on CPU1.
> > > - these two event belongs to the same task_ctx.
> > >
> > > at this point:
> > > task_ctx->time = 0
> > > e0->total_enable_time = 0
> > > e0->total_running_time = 0
> > > e1->total_enable_time = 0
> > > e1->total_running_time = 0
> > >
> > > 2. the task_ctx is sched in CPU0.
> > > - In ctx_sched_in(), the task_ctx->time doesn't updated.
> > > - In event_sched_in() e0 is activated so, its state becomes ACTIVE.
> > > - In event_sched_in() e1 is activated, but soon becomes !ACTIVE
> > > because pmu1 doesn't support CPU1 so it failed in pmu1->add().
> >
> > This doesn't make sense; e1 should never reach event_sched_in() and it
> > should already be INACTIVE.
> >
> > Notable events are created INACTIVE when !attr->disabled.
>
> But in perf stat code, it via enable_counter(), so it's set with
> INACTIVE.
your text above references ctx_sched_in(), what you're now saying is
__perf_event_enable(); *that* will indeed set INACTIVE, but it will then
also fail event_filter_match() and never even reschedule.
> > Also, scheduling should not get beyond merge_sched_in()'s
> > event_filter_match(), which will find the CPU is a mismatch and stop
> > right there.
> >
> > This also means the event (e1) does not get to go on flexible_active
> > (see below).
>
> No, when perf stat command with above, the cpu sets as == -1,
> So, It doesn't filter out in event_filter_match(). so it enter into
> merge_sched_in() and get into event_sched_in().
Hurmph, I thought the hybrid stuff used to set CPU.
Let me try and remember how the hybrid stuff works again. Ah
pmu::filter(), that's called in visit_groups_merge() and should stop
right there if the PMU doesn't work on that CPU.
Is your hybrid PMu not set up right?
> > > To address this, update total_enable_time in event_sched_out() when event state
> > > is PERF_EVENT_STATE_INACTIVE.
> >
> > This is a giant jump that I'm not following. Notably ctx_sched_out()
> > will only iterate pmu_ctx->{pinned,flexible}_active and that list should
> > only include ACTIVE events.
> > So how does handling INACTIVE in event_sched_out() even begin to help?
>
> the answer is in the perf_event_exit_event()'s
> perf_remove_from_context(). in here
> event_sched_out() is called via __perf_remove_from_context()
> So above case, the enable time is fixed in here.
OK, how's this then?
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 065f9188b44a..d12b402f9751 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2422,6 +2422,7 @@ __perf_remove_from_context(struct perf_event *event,
{
struct perf_event_pmu_context *pmu_ctx = event->pmu_ctx;
unsigned long flags = (unsigned long)info;
+ enum perf_event_state state = PERF_EVENT_STATE_OFF;
ctx_time_update(cpuctx, ctx);
@@ -2438,7 +2439,9 @@ __perf_remove_from_context(struct perf_event *event,
perf_child_detach(event);
list_del_event(event, ctx);
if (flags & DETACH_DEAD)
- event->state = PERF_EVENT_STATE_DEAD;
+ state = PERF_EVENT_STATE_DEAD;
+
+ perf_event_set_state(event, state);
if (!pmu_ctx->nr_events) {
pmu_ctx->rotate_necessary = 0;