Re: [External] Re: [PATCH v2 1/6] perf/core: Fix incosistency between cgroup sched_out and sched_in

From: Peter Zijlstra
Date: Wed Mar 23 2022 - 04:12:01 EST


On Tue, Mar 22, 2022 at 11:28:41PM +0800, Chengming Zhou wrote:
> On 2022/3/22 11:16 下午, Chengming Zhou wrote:
> > Hi peter,
> >
> > On 2022/3/22 10:54 下午, Peter Zijlstra wrote:
> >> On Tue, Mar 22, 2022 at 09:38:21PM +0800, Chengming Zhou wrote:
> >>> On 2022/3/22 8:59 下午, Peter Zijlstra wrote:
> >>>> On Tue, Mar 22, 2022 at 08:08:29PM +0800, Chengming Zhou wrote:
> >>>>> There is a race problem that can trigger WARN_ON_ONCE(cpuctx->cgrp)
> >>>>> in perf_cgroup_switch().
> >>>>>
> >>>>> CPU1 CPU2
> >>>>> (in context_switch) (attach running task)
> >>>>> perf_cgroup_sched_out(prev, next)
> >>>>> cgrp1 == cgrp2 is True
> >>>>> next->cgroups = cgrp3
> >>>>> perf_cgroup_attach()
> >>>>> perf_cgroup_sched_in(prev, next)
> >>>>> cgrp1 == cgrp3 is False
>
> I see, you must have been misled by my wrong drawing above ;-)
> I'm sorry, perf_cgroup_attach() on the right should be put at the bottom.
>
> CPU1 CPU2
> (in context_switch) (attach running task)
> perf_cgroup_sched_out(prev, next)
> cgrp1 == cgrp2 is True
> next->cgroups = cgrp3
> perf_cgroup_sched_in(prev, next)
> cgrp1 == cgrp3 is False
> __perf_cgroup_move()
>

Ohhhh, you're taking about CPU2 running cgroup_migrate_execute()...
clear as mud this :/

I think I remember this race; in the scheduler we fixed it by not using
task_css to track the active cgroup and using the various cgroup_subsys
hooks to keep an internally consistent set of state.

But let me go look at what you did in this new light.