Re: [PATCH v2] events/core: fix acoount failure for event's total_enable_time

From: Peter Zijlstra
Date: Fri Jan 10 2025 - 11:37:08 EST


On Fri, Dec 20, 2024 at 04:23:39PM +0000, Yeoreum Yun wrote:

> > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > index 065f9188b44a..71ed8f847b04 100644
> > > --- a/kernel/events/core.c
> > > +++ b/kernel/events/core.c
> > > @@ -2432,6 +2432,7 @@ __perf_remove_from_context(struct perf_event *event,
> > > if (flags & DETACH_DEAD)
> > > event->pending_disable = 1;
> > > event_sched_out(event, ctx);
> > > + perf_event_update_time(event);
> > > if (flags & DETACH_GROUP)
> > > perf_group_detach(event);
> > > if (flags & DETACH_CHILD)
> >
>
> This patch doesn't work when the event is child event.
> In case of parent's event, when you see the list_del_event(),
> the total_enable_time is updated properly by changing state with
> PERF_EVENT_STATE_OFF.
>
> However, child event's total_enable_time is added before list_del_event.
> So, the missing total_enable_time isn't added to parents event and the
> error print happens.
>
> So, I think it wouldn't be possible to update time with set_state.
> instead I think it should update total_enable_time before
> child's total_enable_time is added to parents' child_total_enable_time
>
> like
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 065f9188b44a..d27717c44924 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -13337,6 +13337,7 @@ static void sync_child_event(struct perf_event *child_event)
> }
>
> child_val = perf_event_count(child_event, false);
> + perf_event_update_time(child_event);
>
> /*
> * Add back the child's count to the parent's count:

Well, that again violates the rule that we update time on state change.

AFAICT there is no issue with simply moving the perf_event_set_state()
up a few lines in __perf_remove_from_context().

Notably event_sched_out() will already put us in OFF state; and nothing
after that cares about further states AFAICT.

So isn't the below the simpler solution?

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2438,14 +2438,13 @@ __perf_remove_from_context(struct perf_e
state = PERF_EVENT_STATE_DEAD;
}
event_sched_out(event, ctx);
+ perf_event_set_state(event, min(event->state, state));
if (flags & DETACH_GROUP)
perf_group_detach(event);
if (flags & DETACH_CHILD)
perf_child_detach(event);
list_del_event(event, ctx);

- perf_event_set_state(event, min(event->state, state));
-
if (!pmu_ctx->nr_events) {
pmu_ctx->rotate_necessary = 0;