Re: [PATCH] perf: update perf_cgroup time for ancestor cgroup(s)
From: Peter Zijlstra
Date: Mon Mar 12 2018 - 08:38:36 EST
On Sun, Mar 11, 2018 at 10:32:44PM -0700, Song Liu wrote:
> When a perf_event is attached to parent cgroup, it should count events
> for all children cgroups:
>
> parent_group <---- perf_event
> \
> - child_group <---- process(es)
>
> However, in our tests, we found this perf_event cannot report reliable
> results. This is because perf_event->cgrp and cpuctx->cgrp are not
> identical, thus perf_event->cgrp are not updated properly.
It might help now and in for our older selves, if you could provide a
simple reproducer for this.
> Signed-off-by: Song Liu <songliubraving@xxxxxx>
> Reported-by: Ephraim Park <ephiepark@xxxxxx>
> ---
> kernel/events/core.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 67 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 5789810..623d38f 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -766,6 +821,17 @@ perf_cgroup_set_timestamp(struct task_struct *task,
> cgrp = perf_cgroup_from_task(task, ctx);
> info = this_cpu_ptr(cgrp->info);
> info->timestamp = ctx->timestamp;
> +
> + /* set timestamp for ancestor cgroups */
> + if (cgrp->css.cgroup->level > 1) {
> + struct perf_cgroup_info *info = this_cpu_ptr(cgrp->info);
> + struct perf_event *event;
> +
> + list_for_each_entry(event, &ctx->pinned_groups, group_entry)
> + perf_ancestor_cgroup_set_timestamp(event, cgrp, info);
> + list_for_each_entry(event, &ctx->flexible_groups, group_entry)
> + perf_ancestor_cgroup_set_timestamp(event, cgrp, info);
> + }
> }
That doesn't make any kind of sense... if we're interested in ancestor
groups, then WTH are you iterating _events_ ?
I'm expecting something like:
struct cgroup_subsys_state *css;
for (css = cgrp->css; css; css = css->parent) {
/* fudge time */
}