Re: [PATCH v5 5/6] clone3: allow spawning processes into cgroups

From: Christian Brauner
Date: Tue Feb 04 2020 - 10:01:51 EST


On Tue, Feb 04, 2020 at 12:53:51PM +0100, Peter Zijlstra wrote:
> On Tue, Jan 21, 2020 at 04:48:43PM +0100, Christian Brauner wrote:
> > This adds support for creating a process in a different cgroup than its
> > parent. Callers can limit and account processes and threads right from
> > the moment they are spawned:
> > - A service manager can directly spawn new services into dedicated
> > cgroups.
> > - A process can be directly created in a frozen cgroup and will be
> > frozen as well.
> > - The initial accounting jitter experienced by process supervisors and
> > daemons is eliminated with this.
> > - Threaded applications or even thread implementations can choose to
> > create a specific cgroup layout where each thread is spawned
> > directly into a dedicated cgroup.
> >
> > This feature is limited to the unified hierarchy. Callers need to pass
> > an directory file descriptor for the target cgroup. The caller can
> > choose to pass an O_PATH file descriptor. All usual migration
> > restrictions apply, i.e. there can be no processes in inner nodes. In
> > general, creating a process directly in a target cgroup adheres to all
> > migration restrictions.
>
> AFAICT, he *big* win here is avoiding the write side of the
> cgroup_threadgroup_rwsem. Or am I mis-reading the patch?

No, you're absolutely right. I just didn't bother putting implementation
specifics in the cover letter and I probably should have. So thanks for
pointing that out!

>
> That global lock is what makes moving tasks/threads around super
> expensive, avoiding that by use of this clone() variant wins the day.

:)
Christian