Re: [PATCH RESEND v5] perf/core: Fix installing arbitrary cgroup event into cpu
From: Peter Zijlstra
Date: Mon Mar 12 2018 - 08:25:06 EST
On Wed, Mar 07, 2018 at 07:19:15PM +0800, Lin Xiulei wrote:
> >> + /*
> >> + * if only the cgroup is running on this cpu
> >> + * and cpuctx->cgrp == NULL (otherwise it would've
> >> + * been set with running cgroup), we put this cgroup
> >> + * into cpu context. Or it would case mismatch in
> >> + * following cgroup events at event_filter_match()
> >> + */
> >
> > This is utterly incomprehensible, what?
>
> Yes, this is bit messy. I should've made it clear. This comment was supposed
> to explain the reason why I modified the if statement below.
>
> And the logic is
>
> 1) when cpuctx-> cgrp is NULL, we __must__ take care of how to set it
> appropriately, that means, we __have to__ check if the cgroup is running
> on the cpu
Yes, however, the changelog needs to explain why this was
changed, and the above does not explain where the old code was wrong.
In what case do we fail to do 1 with the current code?
The existing nr_cgroups logic ensures we only continue for the
first/last cgroup event for that context. Is the problem that if the
first cgroup is _not_ of the current cgroup and we correctly do _not_
set cpuctx->cgrp, a second event that _is_ of the right cgroup will then
not have the opportunity to set cpuctx->cgrp ?
> 2) when cpuctx-> cgrp is __NOT__ NULL. It means cpuctx->cgrp had been
> set appropriately by cgroup_switch() or list_update_cgroup_event() before.
> Therefore, We do __nothing__ here
Right, but my variant doubled checked it. If its _not_ NULL it must be
the current task's cgrp, otherwise we have a problem. Same difference,
more paranoia :-)
But I suppose you're right and we can avoid a bunch of looking up if
cpuctx->cgrp is already set.
How is the below patch?
---
Subject: perf/core: Fix installing cgroup events on CPU
From: "leilei.lin" <leilei.lin@xxxxxxxxxxxxxxx>
Date: Tue, 6 Mar 2018 17:36:37 +0800
There's two problems when installing cgroup events on CPUs: firstly
list_update_cgroup_event() only tries to set cpuctx->cgrp for the
first event, if that mismatches on @cgrp we'll not try again for later
additions.
Secondly, when we install a cgroup event into an active context, only
issue an event reprogram when the event matches the current cgroup
context. This avoids a pointless event reprogramming.
Cc: acme@xxxxxxxxxx
Cc: yang_oliver@xxxxxxxxxxx
Cc: mingo@xxxxxxxxxx
Cc: jolsa@xxxxxxxxxx
Cc: eranian@xxxxxxxxx
Cc: torvalds@xxxxxxxxxxxxxxxxxxxx
Cc: tglx@xxxxxxxxxxxxx
Cc: brendan.d.gregg@xxxxxxxxx
Cc: alexander.shishkin@xxxxxxxxxxxxxxx
Signed-off-by: leilei.lin <leilei.lin@xxxxxxxxxxxxxxx>
[peterz: Changelog and comments]
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Link: http://lkml.kernel.org/r/20180306093637.28247-1-linxiulei@xxxxxxxxx
---
kernel/events/core.c | 46 +++++++++++++++++++++++++++++++++++-----------
1 file changed, 35 insertions(+), 11 deletions(-)
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -937,27 +937,39 @@ list_update_cgroup_event(struct perf_eve
if (!is_cgroup_event(event))
return;
- if (add && ctx->nr_cgroups++)
- return;
- else if (!add && --ctx->nr_cgroups)
- return;
/*
* Because cgroup events are always per-cpu events,
* this will always be called from the right CPU.
*/
cpuctx = __get_cpu_context(ctx);
- cpuctx_entry = &cpuctx->cgrp_cpuctx_entry;
- /* cpuctx->cgrp is NULL unless a cgroup event is active in this CPU .*/
- if (add) {
+
+ /*
+ * Since setting cpuctx->cgrp is conditional on the current @cgrp
+ * matching the event's cgroup, we must do this for every new event,
+ * because if the first would mismatch, the second would not try again
+ * and we would leave cpuctx->cgrp unset.
+ */
+ if (add && !cpuctx->cgrp) {
struct perf_cgroup *cgrp = perf_cgroup_from_task(current, ctx);
- list_add(cpuctx_entry, this_cpu_ptr(&cgrp_cpuctx_list));
if (cgroup_is_descendant(cgrp->css.cgroup, event->cgrp->css.cgroup))
cpuctx->cgrp = cgrp;
- } else {
- list_del(cpuctx_entry);
- cpuctx->cgrp = NULL;
}
+
+ if (add && ctx->nr_cgroups++)
+ return;
+ else if (!add && --ctx->nr_cgroups)
+ return;
+
+ /* no cgroup running */
+ if (!add)
+ cpuctx->cgrp = NULL;
+
+ cpuctx_entry = &cpuctx->cgrp_cpuctx_entry;
+ if (add)
+ list_add(cpuctx_entry, this_cpu_ptr(&cgrp_cpuctx_list));
+ else
+ list_del(cpuctx_entry);
}
#else /* !CONFIG_CGROUP_PERF */
@@ -2491,6 +2503,18 @@ static int __perf_install_in_context(vo
raw_spin_lock(&task_ctx->lock);
}
+#ifdef CONFIG_CGROUP_PERF
+ if (is_cgroup_event(event)) {
+ /*
+ * If the current cgroup doesn't match the event's
+ * cgroup, we should not try to schedule it.
+ */
+ struct perf_cgroup *cgrp = perf_cgroup_from_task(current, ctx);
+ reprogram = cgroup_is_descendant(cgrp->css.cgroup,
+ event->cgrp->css.cgroup);
+ }
+#endif
+
if (reprogram) {
ctx_sched_out(ctx, cpuctx, EVENT_TIME);
add_event_to_ctx(event, ctx);