[RFC][PATCH 1/2] memcg: Ensure every task that uses an mm is in the same memory cgroup
From: Eric W. Biederman
Date: Fri Jun 01 2018 - 10:53:21 EST
>From a userspace perspective the cgroup of a mm is determined by which
the cgroup the task belongs too. As practically an mm can only belong to
a single memory cgroup having multiple tasks with the same mm in different
memory cgroups is not well defined.
Avoid the difficulties of dealing with crazy semantics and restrict all
tasks that share a single mm to the same memory cgroup.
This is accomplished by adding a new function mem_cgroup_mm_can_attach
that checks this condition with a straight forward algorithm. In the worst
case it is O(N^2). In the common case it should be O(N) in the number of
tasks being migrated. As typically only a single process and thus a single
process is being migrated and it is optimized for that case.
There are users of mmget such as proc that can create references to an
mm this function can not find. This results in an unnecessary
migration failure. It does not break the invariant that every task of
an mm stays in the same memory cgroup. So this condition is annoying
but harmelss.
This requires multi-threaded mm's to be migrated using the procs file.
This effectively forbids process with mm's shared processes being migrated.
Although enabling the control file might work.
Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
---
mm/memcontrol.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 50 insertions(+), 1 deletion(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 21c13f4768eb..078ef562bb90 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4798,6 +4798,50 @@ static void mem_cgroup_clear_mc(void)
mmput(mm);
}
+static int mem_cgroup_mm_can_attach(struct cgroup_taskset *tset)
+{
+ struct cgroup_subsys_state *css, *tcss;
+ struct task_struct *p, *t;
+ struct mm_struct *mm = NULL;
+ int ret = -EINVAL;
+
+ /*
+ * Ensure all references for every mm can be accounted for by
+ * tasks that are being migrated.
+ */
+ rcu_read_lock();
+ cgroup_taskset_for_each(p, css, tset) {
+ int mm_users, mm_count;
+
+ /* Does this task have an mm that has not been visited? */
+ if (!p->mm || (p->flags & PF_KTHREAD) || (p->mm == mm))
+ continue;
+
+ mm = p->mm;
+ mm_users = atomic_read(&mm->mm_users);
+ if (mm_users == 1)
+ continue;
+
+ mm_count = 0;
+ cgroup_taskset_for_each(t, tcss, tset) {
+ if (t->mm != mm)
+ continue;
+ mm_count++;
+ }
+ /*
+ * If there are no non-task references mm_users will
+ * be stable as holding cgroup_thread_rwsem for write
+ * blocks fork and exit.
+ */
+ if (mm_count != mm_users)
+ goto out;
+ }
+ ret = 0;
+out:
+ rcu_read_unlock();
+ return ret;
+}
+
static int mem_cgroup_can_attach(struct cgroup_taskset *tset)
{
struct cgroup_subsys_state *css;
@@ -4806,7 +4850,12 @@ static int mem_cgroup_can_attach(struct cgroup_taskset *tset)
struct task_struct *leader, *p;
struct mm_struct *mm;
unsigned long move_flags;
- int ret = 0;
+ int ret;
+
+ /* Is every task of every mm in tset being moved? */
+ ret = mem_cgroup_mm_can_attach(tset);
+ if (ret)
+ return ret;
/* charge immigration isn't supported on the default hierarchy */
if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
--
2.14.1