Re: [PATCH] cgroup: Relax restrictions on kernel threads moving out of root cpu cgroup

From: Quentin Perret
Date: Tue Apr 06 2021 - 08:10:50 EST


Hi Pavan,

On Tuesday 06 Apr 2021 at 16:29:13 (+0530), Pavankumar Kondeti wrote:
> In Android GKI, CONFIG_FAIR_GROUP_SCHED is enabled [1] to help prioritize
> important work. Given that CPU shares of root cgroup can't be changed,
> leaving the tasks inside root cgroup will give them higher share
> compared to the other tasks inside important cgroups. This is mitigated
> by moving all tasks inside root cgroup to a different cgroup after
> Android is booted. However, there are many kernel tasks stuck in the
> root cgroup after the boot.
>
> We see all kworker threads are in the root cpu cgroup. This is because,
> tasks with PF_NO_SETAFFINITY flag set are forbidden from cgroup migration.
> This restriction is in place to avoid kworkers getting moved to a cpuset
> which conflicts with kworker affinity. Relax this restriction by explicitly
> checking if the task is moving out of a cpuset cgroup. This allows kworkers
> to be moved out root cpu cgroup.
>
> We also see kthreadd_task and any kernel thread created after the Android boot
> also stuck in the root cgroup. The current code prevents kthreadd_task moving
> out root cgroup to avoid the possibility of creating new RT kernel threads
> inside a cgroup with no RT runtime allocated. Apply this restriction when tasks
> are moving out of cpu cgroup under CONFIG_RT_GROUP_SCHED. This allows all
> kernel threads to be moved out of root cpu cgroup if the kernel does not
> enable RT group scheduling.

OK, so IIUC this only works with cgroup v1 -- the unified hierarchy in
v2 forces you to keep cpu and cpuset in 'sync'. But that should be fine,
so this looks like a nice improvement to me.

> [1] https://android.googlesource.com/kernel/common/+/f08f049de11c15a4251cb1db08cf0bee20bd9b59
>
> Signed-off-by: Pavankumar Kondeti <pkondeti@xxxxxxxxxxxxxx>
> ---
> kernel/cgroup/cgroup-internal.h | 3 ++-
> kernel/cgroup/cgroup-v1.c | 2 +-
> kernel/cgroup/cgroup.c | 24 +++++++++++++++++++-----
> 3 files changed, 22 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h
> index bfbeabc..a96ed9a 100644
> --- a/kernel/cgroup/cgroup-internal.h
> +++ b/kernel/cgroup/cgroup-internal.h
> @@ -232,7 +232,8 @@ int cgroup_migrate(struct task_struct *leader, bool threadgroup,
> int cgroup_attach_task(struct cgroup *dst_cgrp, struct task_struct *leader,
> bool threadgroup);
> struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
> - bool *locked)
> + bool *locked,
> + struct cgroup *dst_cgrp)
> __acquires(&cgroup_threadgroup_rwsem);
> void cgroup_procs_write_finish(struct task_struct *task, bool locked)
> __releases(&cgroup_threadgroup_rwsem);
> diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
> index a575178..d674a6c 100644
> --- a/kernel/cgroup/cgroup-v1.c
> +++ b/kernel/cgroup/cgroup-v1.c
> @@ -497,7 +497,7 @@ static ssize_t __cgroup1_procs_write(struct kernfs_open_file *of,
> if (!cgrp)
> return -ENODEV;
>
> - task = cgroup_procs_write_start(buf, threadgroup, &locked);
> + task = cgroup_procs_write_start(buf, threadgroup, &locked, cgrp);
> ret = PTR_ERR_OR_ZERO(task);
> if (ret)
> goto out_unlock;
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index 9153b20..41864a8 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -2744,7 +2744,8 @@ int cgroup_attach_task(struct cgroup *dst_cgrp, struct task_struct *leader,
> }
>
> struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
> - bool *locked)
> + bool *locked,
> + struct cgroup *dst_cgrp)
> __acquires(&cgroup_threadgroup_rwsem)
> {
> struct task_struct *tsk;
> @@ -2784,15 +2785,28 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
> tsk = tsk->group_leader;
>
> /*
> + * RT kthreads may be born in a cgroup with no rt_runtime allocated.
> + * Just say no.
> + */
> +#ifdef CONFIG_RT_GROUP_SCHED
> + if (tsk->no_cgroup_migration && (dst_cgrp->root->subsys_mask & (1U << cpu_cgrp_id))) {
> + tsk = ERR_PTR(-EINVAL);
> + goto out_unlock_threadgroup;
> + }
> +#endif
> +
> + /*
> * kthreads may acquire PF_NO_SETAFFINITY during initialization.
> * If userland migrates such a kthread to a non-root cgroup, it can
> - * become trapped in a cpuset, or RT kthread may be born in a
> - * cgroup with no rt_runtime allocated. Just say no.
> + * become trapped in a cpuset. Just say no.
> */
> - if (tsk->no_cgroup_migration || (tsk->flags & PF_NO_SETAFFINITY)) {
> +#ifdef CONFIG_CPUSETS
> + if ((tsk->no_cgroup_migration || (tsk->flags & PF_NO_SETAFFINITY)) &&
> + (dst_cgrp->root->subsys_mask & (1U << cpuset_cgrp_id))) {
> tsk = ERR_PTR(-EINVAL);
> goto out_unlock_threadgroup;
> }
> +#endif

Nit: maybe move this #ifdefery out to a header?

Thanks,
Quentin