Re: [PATCH RESEND v5] perf/core: Fix installing arbitrary cgroup event into cpu
From: Peter Zijlstra
Date: Tue Mar 06 2018 - 06:50:25 EST
On Tue, Mar 06, 2018 at 05:36:37PM +0800, linxiulei@xxxxxxxxx wrote:
> From: "leilei.lin" <leilei.lin@xxxxxxxxxxxxxxx>
>
> Do not install cgroup event into the CPU context and schedule it
> if the cgroup is not running on this CPU
OK, so far so good, this explains the bit in
__perf_install_in_context().
> While there is no task of cgroup running specified CPU, current
> kernel still install cgroup event into CPU context that causes
> another cgroup event can't be installed into this CPU.
>
> This patch prevent scheduling events at __perf_install_in_context()
> and installing events at list_update_cgroup_event() if cgroup isn't
> running on specified CPU.
This bit doesn't make sense, you don't in fact avoid anything in
list_update_cgroup_event(), you do more, not less.
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 4df5b69..f3ffa70 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -933,31 +933,45 @@ list_update_cgroup_event(struct perf_event *event,
> {
> struct perf_cpu_context *cpuctx;
> struct list_head *cpuctx_entry;
> + struct perf_cgroup *cgrp;
>
> if (!is_cgroup_event(event))
> return;
>
> /*
> * Because cgroup events are always per-cpu events,
> * this will always be called from the right CPU.
> */
> cpuctx = __get_cpu_context(ctx);
> + cgrp = perf_cgroup_from_task(current, ctx);
> +
> + /*
> + * if only the cgroup is running on this cpu
> + * and cpuctx->cgrp == NULL (otherwise it would've
> + * been set with running cgroup), we put this cgroup
> + * into cpu context. Or it would case mismatch in
> + * following cgroup events at event_filter_match()
> + */
This is utterly incomprehensible, what?
> + if (add && !cpuctx->cgrp &&
> + cgroup_is_descendant(cgrp->css.cgroup,
> + event->cgrp->css.cgroup)) {
> + cpuctx->cgrp = cgrp;
> + }
And that's just horrible coding style. Maybe something like:
if (add && cgroup_is_descendant(cgrp->css.cgroup, event->cgrp->css.cgroup)) {
if (cpuctx->cgrp)
WARN_ON_ONCE(cpuctx->cgrp != cgrp);
cpuctx->cgrp = cgrp;
}
that? But that still needs a comment to explain _why_ we do that here.
Under what condition would we fail to have cpuctx->cgrp set while
ctx->nr_cgroups. Your comment doesn't explain nor does your Changelog.
> +
> + if (add && ctx->nr_cgroups++)
> + return;
> + else if (!add && --ctx->nr_cgroups)
> + return;
>
> + /* no cgroup running */
> + if (!add)
> + cpuctx->cgrp = NULL;
> +
> + cpuctx_entry = &cpuctx->cgrp_cpuctx_entry;
> + if (add)
> list_add(cpuctx_entry, this_cpu_ptr(&cgrp_cpuctx_list));
> + else
> list_del(cpuctx_entry);
> }
>
> #else /* !CONFIG_CGROUP_PERF */
> @@ -2311,6 +2325,20 @@ static int __perf_install_in_context(void *info)
> raw_spin_lock(&task_ctx->lock);
> }
>
> +#ifdef CONFIG_CGROUP_PERF
> + if (is_cgroup_event(event)) {
> + /*
> + * Only care about cgroup events.
> + *
That bit is entirely spurious, if it right after if (is_cgroup_event()),
obviously this block is only for cgroup events.
> + * If only the task belongs to cgroup of this event,
> + * we will continue the installment
And that isn't really english. I think you meant to write something
like:
/*
* If the current cgroup doesn't match the event's
* cgroup, we should not try to schedule it.
*/
> + */
> + struct perf_cgroup *cgrp = perf_cgroup_from_task(current, ctx);
> + reprogram = cgroup_is_descendant(cgrp->css.cgroup,
> + event->cgrp->css.cgroup);
> + }
> +#endif
> +
> if (reprogram) {
> ctx_sched_out(ctx, cpuctx, EVENT_TIME);
> add_event_to_ctx(event, ctx);
> --
> 2.8.4.31.g9ed660f
>