Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

From: Tejun Heo
Date: Tue May 05 2015 - 12:07:52 EST


Hello, Peter.

On Mon, May 04, 2015 at 02:37:38PM +0200, Peter Zijlstra wrote:
> > I just realized we allow removing/adding controllers from/to cgroups
> > while there are tasks in them, which isn't safe unless we eliminate all
> > can_attach callbacks. We've done so for some cgroup subsystems, but
> > there are still a few of them...
>
> You can't remove can_attach(), we must be able to disallow joining a
> cgroup.
>
> If that results in you not being able to change the cgroup setup with
> tasks in, so be it -- that seems like a sane restriction anyhow.

This is really an interface policy issue. For all other controllers,
it's almost trivial to let organizational operations (setting up
hierarchies, moving processes around) overrule controller
configurations. The main benefit of doing this is that this decouples
organizational operations from resource control. Users can depend on
the fact that allowed organizational operations won't fail due to
specific controller configuration issues.

This also works well with controllers accepting target configurations
regardless of the current state and enforcing rules to converge to the
configured state instead. e.g. if you set max memory lower than the
currently used, the config will be accepted and the controller will
keep trying to make the current state converge to the target state.
This is important as rejecting configuration can lead to chasing game
between configuration attempts and run-away resource consumption.

Now, RR slices are the special case here because it's inherently
different from every other resource cgroup is concerned with. It
simply doesn't fit into the same model that other resources follow.
There are several options we can try.

1. Decouple RR slices from cpu controller. This would be the best
route to follow. RR slices need a hard allocator no matter what we
do. There isn't much point in imposing hierarchical structure on
top of it.

2. Implement special case behavior so that it can follow the same
model. e.g. resetting RR scheduling config when the effective cpu
cgroup changes or carrying the amount of slice being consumed with
the process being moved. No matter how this is done, it's gonna be
a clear compromise as we're forcing this into the model which
doesn't quite fit it. That said, given how RR slices are a special
case to begin with, I think this can be acceptable.

3. Take compromise in the other direction - add exceptions to
organizational operations but clearly limit the failure modes. We
prolly want to structure code in a way to enforce this.

4. If #1 can be done in time but not right now, simply disallow any
RR/FIFO in !root cgroups on the unified hierarchy for now.

What do you think?

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/