[PATCH] cgroups: fix probable race with put_css_set[_taskexit] andfind_css_set

From: Lai Jiangshan
Date: Tue Aug 19 2008 - 02:31:47 EST


put_css_set_taskexit may be called when find_css_set is called on
other cpu. And the race will occur:

put_css_set_taskexit side find_css_set side

|
atomic_dec_and_test(&kref->refcount) |
/* kref->refcount = 0 */ |
....................................................................
| read_lock(&css_set_lock)
| find_existing_css_set
| get_css_set
| read_unlock(&css_set_lock);
....................................................................
__release_css_set |
....................................................................
| /* use a released css_set */
|


[put_css_set is the same. But in the current code, all put_css_set are
put into cgroup mutex critical region as the same as find_css_set.]


Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxx>
---
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 13932ab..003912e 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -241,7 +241,6 @@ static void unlink_css_set(struct css_set *cg)
struct cg_cgroup_link *link;
struct cg_cgroup_link *saved_link;

- write_lock(&css_set_lock);
hlist_del(&cg->hlist);
css_set_count--;

@@ -251,8 +250,6 @@ static void unlink_css_set(struct css_set *cg)
list_del(&link->cgrp_link_list);
kfree(link);
}
-
- write_unlock(&css_set_lock);
}

static void __release_css_set(struct kref *k, int taskexit)
@@ -260,7 +257,13 @@ static void __release_css_set(struct kref *k, int taskexit)
int i;
struct css_set *cg = container_of(k, struct css_set, ref);

+ write_lock(&css_set_lock);
+ if (atomic_read(&k->refcount) > 0) {
+ write_unlock(&css_set_lock);
+ return;
+ }
unlink_css_set(cg);
+ write_unlock(&css_set_lock);

rcu_read_lock();
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
@@ -410,6 +413,20 @@ static struct css_set *find_css_set(
* the desired set */
read_lock(&css_set_lock);
res = find_existing_css_set(oldcg, cgrp, template);
+ /*
+ * put_css_set[_taskexit]() may race with find_css_set(), in that
+ * find_css_set() got the css_set after put_css_set() had released it.
+ *
+ * We should put the whole put_css_set[_taskexit]() into css_set_lock's
+ * write_lock critical setion to avoid this race. But it will increase
+ * overhead for do_exit().
+ *
+ * So we do not avoid this race but put it under control:
+ * __release_css_set() will re-check the refcount
+ * with css_set_lock held.
+ *
+ * This race may trigger the warnning in kref_get().
+ */
if (res)
get_css_set(res);
read_unlock(&css_set_lock);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/