Re: [PATCH 3/6] cgroup: introduce cgroup_taskset and use it insubsys->can_attach(), cancel_attach() and attach()

From: KAMEZAWA Hiroyuki
Date: Wed Aug 24 2011 - 20:47:48 EST


On Wed, 24 Aug 2011 00:19:57 +0200
Tejun Heo <tj@xxxxxxxxxx> wrote:

> Currently, there's no way to pass multiple tasks to cgroup_subsys
> methods necessitating the need for separate per-process and per-task
> methods. This patch introduces cgroup_taskset which can be used to
> pass multiple tasks and their associated cgroups to cgroup_subsys
> methods.
>
> Three methods - can_attach(), cancel_attach() and attach() - are
> converted to use cgroup_taskset. This unifies passed parameters so
> that all methods have access to all information. Conversions in this
> patchset are identical and don't introduce any behavior change.
>
> Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
> Cc: Paul Menage <paul@xxxxxxxxxxxxxx>
> Cc: Li Zefan <lizf@xxxxxxxxxxxxxx>
> Cc: Balbir Singh <bsingharora@xxxxxxxxx>
> Cc: Daisuke Nishimura <nishimura@xxxxxxxxxxxxxxxxx>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
> Cc: James Morris <jmorris@xxxxxxxxx>

Thank you for your work. I welcome this !

Some comments around memcg.

> ---
> Documentation/cgroups/cgroups.txt | 26 ++++++----
> include/linux/cgroup.h | 28 +++++++++-
> kernel/cgroup.c | 99 +++++++++++++++++++++++++++++++++----
> kernel/cgroup_freezer.c | 2 +-
> kernel/cpuset.c | 18 ++++---
> mm/memcontrol.c | 16 +++---
> security/device_cgroup.c | 7 ++-
> 7 files changed, 153 insertions(+), 43 deletions(-)
>
> diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
> index cd67e90..2eee7cf 100644
> --- a/Documentation/cgroups/cgroups.txt
> +++ b/Documentation/cgroups/cgroups.txt
> @@ -594,16 +594,21 @@ rmdir() will fail with it. From this behavior, pre_destroy() can be
> called multiple times against a cgroup.
>
> int can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
> - struct task_struct *task)
> + struct cgroup_taskset *tset)
> (cgroup_mutex held by caller)
>
> -Called prior to moving a task into a cgroup; if the subsystem
> -returns an error, this will abort the attach operation. If a NULL
> -task is passed, then a successful result indicates that *any*
> -unspecified task can be moved into the cgroup. Note that this isn't
> +Called prior to moving one or more tasks into a cgroup; if the
> +subsystem returns an error, this will abort the attach operation.
> +@tset contains the tasks to be attached and is guaranteed to have at
> +least one task in it. If there are multiple, it's guaranteed that all
> +are from the same thread group,


Do this, "If there are multiple, it's guaranteed that all
are from the same thread group ", means the 'tset' contains
only one mm_struct ?

And is it guaranteed that any task in tset will not be freed while
subsystem routine runs ?

> @tset contains all tasks from the
> +group whether they're actually switching cgroup or not, and the first
> +task is the leader. Each @tset entry also contains the task's old
> +cgroup and tasks which aren't switching cgroup can be skipped easily
> +using the cgroup_taskset_for_each() iterator. Note that this isn't
> called on a fork. If this method returns 0 (success) then this should
> -remain valid while the caller holds cgroup_mutex and it is ensured that either
> -attach() or cancel_attach() will be called in future.
> +remain valid while the caller holds cgroup_mutex and it is ensured
> +that either attach() or cancel_attach() will be called in future.
>
> int can_attach_task(struct cgroup *cgrp, struct task_struct *tsk);
> (cgroup_mutex held by caller)
> @@ -613,14 +618,14 @@ attached (possibly many when using cgroup_attach_proc). Called after
> can_attach.
>
> void cancel_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
> - struct task_struct *task, bool threadgroup)
> + struct cgroup_taskset *tset)
> (cgroup_mutex held by caller)
>
> Called when a task attach operation has failed after can_attach() has succeeded.
> A subsystem whose can_attach() has some side-effects should provide this
> function, so that the subsystem can implement a rollback. If not, not necessary.
> This will be called only about subsystems whose can_attach() operation have
> -succeeded.
> +succeeded. The parameters are identical to can_attach().
>
> void pre_attach(struct cgroup *cgrp);
> (cgroup_mutex held by caller)
> @@ -629,11 +634,12 @@ For any non-per-thread attachment work that needs to happen before
> attach_task. Needed by cpuset.
>
> void attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
> - struct cgroup *old_cgrp, struct task_struct *task)
> + struct cgroup_taskset *tset)
> (cgroup_mutex held by caller)
>
> Called after the task has been attached to the cgroup, to allow any
> post-attachment activity that requires memory allocations or blocking.
> +The parameters are identical to can_attach().
>
> void attach_task(struct cgroup *cgrp, struct task_struct *tsk);
> (cgroup_mutex held by caller)
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index da7e4bc..2470c8e 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -457,6 +457,28 @@ void cgroup_exclude_rmdir(struct cgroup_subsys_state *css);
> void cgroup_release_and_wakeup_rmdir(struct cgroup_subsys_state *css);
>
> /*
> + * Control Group taskset, used to pass around set of tasks to cgroup_subsys
> + * methods.
> + */
> +struct cgroup_taskset;
> +struct task_struct *cgroup_taskset_first(struct cgroup_taskset *tset);
> +struct task_struct *cgroup_taskset_next(struct cgroup_taskset *tset);
> +struct cgroup *cgroup_taskset_cur_cgroup(struct cgroup_taskset *tset);
> +int cgroup_taskset_size(struct cgroup_taskset *tset);
> +
> +/**
> + * cgroup_taskset_for_each - iterate cgroup_taskset
> + * @task: the loop cursor
> + * @skip_cgrp: skip if task's cgroup matches this, %NULL to iterate through all
> + * @tset: taskset to iterate
> + */
> +#define cgroup_taskset_for_each(task, skip_cgrp, tset) \
> + for ((task) = cgroup_taskset_first((tset)); (task); \
> + (task) = cgroup_taskset_next((tset))) \
> + if (!(skip_cgrp) || \
> + cgroup_taskset_cur_cgroup((tset)) != (skip_cgrp))
> +
> +/*
> * Control Group subsystem type.
> * See Documentation/cgroups/cgroups.txt for details
> */
> @@ -467,14 +489,14 @@ struct cgroup_subsys {
> int (*pre_destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp);
> void (*destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp);
> int (*can_attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
> - struct task_struct *tsk);
> + struct cgroup_taskset *tset);
> int (*can_attach_task)(struct cgroup *cgrp, struct task_struct *tsk);
> void (*cancel_attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
> - struct task_struct *tsk);
> + struct cgroup_taskset *tset);
> void (*pre_attach)(struct cgroup *cgrp);
> void (*attach_task)(struct cgroup *cgrp, struct task_struct *tsk);
> void (*attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
> - struct cgroup *old_cgrp, struct task_struct *tsk);
> + struct cgroup_taskset *tset);
> void (*fork)(struct cgroup_subsys *ss, struct task_struct *task);
> void (*exit)(struct cgroup_subsys *ss, struct cgroup *cgrp,
> struct cgroup *old_cgrp, struct task_struct *task);
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index cf5f3e3..474674b 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -1739,11 +1739,85 @@ int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen)
> }
> EXPORT_SYMBOL_GPL(cgroup_path);
>
> +/*
> + * Control Group taskset
> + */
> struct task_and_cgroup {
> struct task_struct *task;
> struct cgroup *cgrp;
> };
>
> +struct cgroup_taskset {
> + struct task_and_cgroup single;
> + struct flex_array *tc_array;
> + int tc_array_len;
> + int idx;
> + struct cgroup *cur_cgrp;
> +};
> +
> +/**
> + * cgroup_taskset_first - reset taskset and return the first task
> + * @tset: taskset of interest
> + *
> + * @tset iteration is initialized and the first task is returned.
> + */
> +struct task_struct *cgroup_taskset_first(struct cgroup_taskset *tset)
> +{
> + if (tset->tc_array) {
> + tset->idx = 0;
> + return cgroup_taskset_next(tset);
> + } else {
> + tset->cur_cgrp = tset->single.cgrp;
> + return tset->single.task;
> + }
> +}
> +EXPORT_SYMBOL_GPL(cgroup_taskset_first);
> +
> +/**
> + * cgroup_taskset_next - iterate to the next task in taskset
> + * @tset: taskset of interest
> + *
> + * Return the next task in @tset. Iteration must have been initialized
> + * with cgroup_taskset_first().
> + */
> +struct task_struct *cgroup_taskset_next(struct cgroup_taskset *tset)
> +{
> + struct task_and_cgroup *tc;
> +
> + if (!tset->tc_array || tset->idx >= tset->tc_array_len)
> + return NULL;
> +
> + tc = flex_array_get(tset->tc_array, tset->idx++);
> + tset->cur_cgrp = tc->cgrp;
> + return tc->task;
> +}
> +EXPORT_SYMBOL_GPL(cgroup_taskset_next);
> +
> +/**
> + * cgroup_taskset_cur_cgroup - return the matching cgroup for the current task
> + * @tset: taskset of interest
> + *
> + * Return the cgroup for the current (last returned) task of @tset. This
> + * function must be preceded by either cgroup_taskset_first() or
> + * cgroup_taskset_next().
> + */
> +struct cgroup *cgroup_taskset_cur_cgroup(struct cgroup_taskset *tset)
> +{
> + return tset->cur_cgrp;
> +}
> +EXPORT_SYMBOL_GPL(cgroup_taskset_cur_cgroup);
> +
> +/**
> + * cgroup_taskset_size - return the number of tasks in taskset
> + * @tset: taskset of interest
> + */
> +int cgroup_taskset_size(struct cgroup_taskset *tset)
> +{
> + return tset->tc_array ? tset->tc_array_len : 1;
> +}
> +EXPORT_SYMBOL_GPL(cgroup_taskset_size);
> +
> +
> /*
> * cgroup_task_migrate - move a task from one cgroup to another.
> *
> @@ -1828,15 +1902,19 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
> struct cgroup_subsys *ss, *failed_ss = NULL;
> struct cgroup *oldcgrp;
> struct cgroupfs_root *root = cgrp->root;
> + struct cgroup_taskset tset = { };
>
> /* Nothing to do if the task is already in that cgroup */
> oldcgrp = task_cgroup_from_root(tsk, root);
> if (cgrp == oldcgrp)
> return 0;
>
> + tset.single.task = tsk;
> + tset.single.cgrp = oldcgrp;
> +
> for_each_subsys(root, ss) {
> if (ss->can_attach) {
> - retval = ss->can_attach(ss, cgrp, tsk);
> + retval = ss->can_attach(ss, cgrp, &tset);
> if (retval) {
> /*
> * Remember on which subsystem the can_attach()
> @@ -1867,7 +1945,7 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
> if (ss->attach_task)
> ss->attach_task(cgrp, tsk);
> if (ss->attach)
> - ss->attach(ss, cgrp, oldcgrp, tsk);
> + ss->attach(ss, cgrp, &tset);
> }
>
> synchronize_rcu();
> @@ -1889,7 +1967,7 @@ out:
> */
> break;
> if (ss->cancel_attach)
> - ss->cancel_attach(ss, cgrp, tsk);
> + ss->cancel_attach(ss, cgrp, &tset);
> }
> }
> return retval;
> @@ -2005,6 +2083,7 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
> struct task_struct *tsk;
> struct task_and_cgroup *tc;
> struct flex_array *group;
> + struct cgroup_taskset tset = { };
> /*
> * we need to make sure we have css_sets for all the tasks we're
> * going to move -before- we actually start moving them, so that in
> @@ -2067,6 +2146,8 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
> } while_each_thread(leader, tsk);
> /* remember the number of threads in the array for later. */
> group_size = i;
> + tset.tc_array = group;
> + tset.tc_array_len = group_size;
> rcu_read_unlock();
>
> /* methods shouldn't be called if no task is actually migrating */
> @@ -2079,7 +2160,7 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
> */
> for_each_subsys(root, ss) {
> if (ss->can_attach) {
> - retval = ss->can_attach(ss, cgrp, leader);
> + retval = ss->can_attach(ss, cgrp, &tset);
> if (retval) {
> failed_ss = ss;
> goto out_cancel_attach;
> @@ -2169,10 +2250,8 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
> * being moved, this call will need to be reworked to communicate that.
> */
> for_each_subsys(root, ss) {
> - if (ss->attach) {
> - tc = flex_array_get(group, 0);
> - ss->attach(ss, cgrp, tc->cgrp, tc->task);
> - }
> + if (ss->attach)
> + ss->attach(ss, cgrp, &tset);
> }
>
> /*
> @@ -2194,11 +2273,11 @@ out_cancel_attach:
> for_each_subsys(root, ss) {
> if (ss == failed_ss) {
> if (cancel_failed_ss && ss->cancel_attach)
> - ss->cancel_attach(ss, cgrp, leader);
> + ss->cancel_attach(ss, cgrp, &tset);
> break;
> }
> if (ss->cancel_attach)
> - ss->cancel_attach(ss, cgrp, leader);
> + ss->cancel_attach(ss, cgrp, &tset);
> }
> }
> out_put_tasks:
> diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> index 4e82525..a2b0082 100644
> --- a/kernel/cgroup_freezer.c
> +++ b/kernel/cgroup_freezer.c
> @@ -159,7 +159,7 @@ static void freezer_destroy(struct cgroup_subsys *ss,
> */
> static int freezer_can_attach(struct cgroup_subsys *ss,
> struct cgroup *new_cgroup,
> - struct task_struct *task)
> + struct cgroup_taskset *tset)
> {
> struct freezer *freezer;
>
> diff --git a/kernel/cpuset.c b/kernel/cpuset.c
> index 10131fd..2e5825b 100644
> --- a/kernel/cpuset.c
> +++ b/kernel/cpuset.c
> @@ -1368,10 +1368,10 @@ static int fmeter_getrate(struct fmeter *fmp)
> }
>
> /* Called by cgroups to determine if a cpuset is usable; cgroup_mutex held */
> -static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cont,
> - struct task_struct *tsk)
> +static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
> + struct cgroup_taskset *tset)
> {
> - struct cpuset *cs = cgroup_cs(cont);
> + struct cpuset *cs = cgroup_cs(cgrp);
>
> if (cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed))
> return -ENOSPC;
> @@ -1384,7 +1384,7 @@ static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cont,
> * set_cpus_allowed_ptr() on all attached tasks before cpus_allowed may
> * be changed.
> */
> - if (tsk->flags & PF_THREAD_BOUND)
> + if (cgroup_taskset_first(tset)->flags & PF_THREAD_BOUND)
> return -EINVAL;
>
> return 0;
> @@ -1434,12 +1434,14 @@ static void cpuset_attach_task(struct cgroup *cont, struct task_struct *tsk)
> cpuset_update_task_spread_flag(cs, tsk);
> }
>
> -static void cpuset_attach(struct cgroup_subsys *ss, struct cgroup *cont,
> - struct cgroup *oldcont, struct task_struct *tsk)
> +static void cpuset_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
> + struct cgroup_taskset *tset)
> {
> struct mm_struct *mm;
> - struct cpuset *cs = cgroup_cs(cont);
> - struct cpuset *oldcs = cgroup_cs(oldcont);
> + struct task_struct *tsk = cgroup_taskset_first(tset);
> + struct cgroup *oldcgrp = cgroup_taskset_cur_cgroup(tset);
> + struct cpuset *cs = cgroup_cs(cgrp);
> + struct cpuset *oldcs = cgroup_cs(oldcgrp);
>
> /*
> * Change mm, possibly for multiple threads in a threadgroup. This is
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 930de94..b2802cc 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5460,8 +5460,9 @@ static void mem_cgroup_clear_mc(void)
>
> static int mem_cgroup_can_attach(struct cgroup_subsys *ss,
> struct cgroup *cgroup,
> - struct task_struct *p)
> + struct cgroup_taskset *tset)
> {
> + struct task_struct *p = cgroup_taskset_first(tset);
> int ret = 0;
> struct mem_cgroup *mem = mem_cgroup_from_cont(cgroup);
>

Ah..hmm. I think this doesn't work as expected for memcg.
Maybe code like this will be required.

{
struct mem_cgroup *mem = mem_cgroup_from_cont(cgroup);
struct mem_cgroup *from = NULL;
struct task_struct *p;
int ret = 0;
/*
* memcg just works against mm-owner. Check mm-owner is in this cgroup.
* Because tset contains only one thread-group, we'll find a task of
* mm->owner, at most.
*/
for_cgroup_taskset_for_each(task, NULL, tset) {
struct mm_struct *mm;

mm = get_task_mm(task);
if (!mm)
continue;
if (mm->owner == task) {
p = task;
break;
}
mmput(mm);
}
if (!p)
return ret;
from = mem_cgroup_from_task(p);
mem_cgroup_start_move(from);
spin_lock(&mc.lock);
mc.from = from;
mc.to = mem;
spin_unlock(&mc.lock);
/* We set mc.moving_task later */

ret = mem_cgroup_precharge_mc(mm);
if (ret)
mem_cgroup_clear_mc();
mm_put(mm);
return ret;
}



> @@ -5499,7 +5500,7 @@ static int mem_cgroup_can_attach(struct cgroup_subsys *ss,
>
> static void mem_cgroup_cancel_attach(struct cgroup_subsys *ss,
> struct cgroup *cgroup,
> - struct task_struct *p)
> + struct cgroup_taskset *tset)
> {
> mem_cgroup_clear_mc();
> }
> @@ -5616,9 +5617,9 @@ retry:
>
> static void mem_cgroup_move_task(struct cgroup_subsys *ss,
> struct cgroup *cont,
> - struct cgroup *old_cont,
> - struct task_struct *p)
> + struct cgroup_taskset *tset)
> {
> + struct task_struct *p = cgroup_taskset_first(tset);
> struct mm_struct *mm = get_task_mm(p);
>

Similar code with can_attach() will be required.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/