Re: query: [PATCH 2/2] cgroup: Remove call to synchronize_rcu incgroup_attach_task

From: Mike Galbraith
Date: Tue Apr 12 2011 - 23:12:04 EST

Next message: George Spelvin: "Re: Uhhuh. NMI received for unknown reason 2d on CPU 0."
Previous message: Tejun Heo: "Re: [PATCH 4/4]percpu_counter: use atomic64 for counter"
In reply to: Li Zefan: "Re: query: [PATCH 2/2] cgroup: Remove call to synchronize_rcu incgroup_attach_task"
Next in thread: Paul Menage: "Re: query: [PATCH 2/2] cgroup: Remove call to synchronize_rcu in cgroup_attach_task"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, 2011-04-13 at 10:02 +0800, Li Zefan wrote:
> Mike Galbraith wrote:
> > Greetings,
> >
> > Wrt these patches:
> >
> > https://lkml.org/lkml/2010/11/24/14 [PATCH 1/2] cgroup: Set CGRP_RELEASABLE when adding to a cgroup
> > https://lkml.org/lkml/2010/11/24/15 [PATCH 2/2] cgroup: Remove call to synchronize_rcu in cgroup_attach_task
> >
> > I received a query regarding 2/2 because a large database company is
> > apparently moving tasks between cgroups frequently enough that their
> > database initialization time dropped from ~11 hours to ~4 hours when
> > they applied this patch.
> >
> > Curious why these got no traction.
>
> I thought Paul was following the this. I'll spend some time on patch
> review.

Great!

Three orders of magnitude latency improvements are a terrible thing to
waste ;-) I tried doing it a bit differently, but would have ended up
about the same due to the need for rmdir to succeed after the attach
(detach of last task) returns.

However...

If the user _does_ that rmdir(), it's more or less back to square one.
RCU grace periods should not impact userland, but if you try to do
create/attach/detach/destroy, you run into the same bottleneck, as does
any asynchronous GC, though that's not such a poke in the eye. I tried
a straight forward move to schedule_work(), and it seems to work just
fine. rmdir() no longer takes ~30ms on my box, but closer to 20us.

cgroups: Remove call to synchronize_rcu() in cgroup_diput()

Instead of synchronously waiting via synchronize_rcu(), then initiating cgroup
destruction, schedule asynchronous destruction via call_rcu()->schedule_work()
and move along smartly.

Some numbers:
1000 x simple loop - create/attach self/detatch self/destroy, zero work.

Virgin source
real 1m39.713s 1.000000
user 0m0.000s
sys 0m0.076s

+ Android commits 60cdbd1f and 05946a1
real 0m33.627s .337237
user 0m0.056s
sys 0m0.000s

+ Android commits + below
real 0m0.046s .000461
user 0m0.000s
sys 0m0.044s

Not-signed-off-by: Mike Galbraith <efault@xxxxxx>

---
include/linux/cgroup.h | 1
kernel/cgroup.c | 59 +++++++++++++++++++++++++++----------------------
2 files changed, 34 insertions(+), 26 deletions(-)

Index: linux-2.6.39.git/include/linux/cgroup.h
===================================================================
--- linux-2.6.39.git.orig/include/linux/cgroup.h
+++ linux-2.6.39.git/include/linux/cgroup.h
@@ -231,6 +231,7 @@ struct cgroup {

/* For RCU-protected deletion */
struct rcu_head rcu_head;
+ struct work_struct work;

/* List of events which userspace want to recieve */
struct list_head event_list;
Index: linux-2.6.39.git/kernel/cgroup.c
===================================================================
--- linux-2.6.39.git.orig/kernel/cgroup.c
+++ linux-2.6.39.git/kernel/cgroup.c
@@ -836,11 +836,42 @@ static int cgroup_call_pre_destroy(struc
return ret;
}

+static void free_cgroup_work(struct work_struct *work)
+{
+ struct cgroup *cgrp = container_of(work, struct cgroup, work);
+ struct cgroup_subsys *ss;
+
+ mutex_lock(&cgroup_mutex);
+ /*
+ * Release the subsystem state objects.
+ */
+ for_each_subsys(cgrp->root, ss)
+ ss->destroy(ss, cgrp);
+
+ cgrp->root->number_of_cgroups--;
+ mutex_unlock(&cgroup_mutex);
+
+ /*
+ * Drop the active superblock reference that we took when we
+ * created the cgroup
+ */
+ deactivate_super(cgrp->root->sb);
+
+ /*
+ * if we're getting rid of the cgroup, refcount should ensure
+ * that there are no pidlists left.
+ */
+ BUG_ON(!list_empty(&cgrp->pidlists));
+
+ kfree(cgrp);
+}
+
static void free_cgroup_rcu(struct rcu_head *obj)
{
struct cgroup *cgrp = container_of(obj, struct cgroup, rcu_head);

- kfree(cgrp);
+ INIT_WORK(&cgrp->work, free_cgroup_work);
+ schedule_work(&cgrp->work);
}

static void cgroup_diput(struct dentry *dentry, struct inode *inode)
@@ -848,7 +879,7 @@ static void cgroup_diput(struct dentry *
/* is dentry a directory ? if so, kfree() associated cgroup */
if (S_ISDIR(inode->i_mode)) {
struct cgroup *cgrp = dentry->d_fsdata;
- struct cgroup_subsys *ss;
+
BUG_ON(!(cgroup_is_removed(cgrp)));
/* It's possible for external users to be holding css
* reference counts on a cgroup; css_put() needs to
@@ -856,30 +887,6 @@ static void cgroup_diput(struct dentry *
* the reference count in order to know if it needs to
* queue the cgroup to be handled by the release
* agent */
- synchronize_rcu();
-
- mutex_lock(&cgroup_mutex);
- /*
- * Release the subsystem state objects.
- */
- for_each_subsys(cgrp->root, ss)
- ss->destroy(ss, cgrp);
-
- cgrp->root->number_of_cgroups--;
- mutex_unlock(&cgroup_mutex);
-
- /*
- * Drop the active superblock reference that we took when we
- * created the cgroup
- */
- deactivate_super(cgrp->root->sb);
-
- /*
- * if we're getting rid of the cgroup, refcount should ensure
- * that there are no pidlists left.
- */
- BUG_ON(!list_empty(&cgrp->pidlists));
-
call_rcu(&cgrp->rcu_head, free_cgroup_rcu);
}
iput(inode);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: George Spelvin: "Re: Uhhuh. NMI received for unknown reason 2d on CPU 0."
Previous message: Tejun Heo: "Re: [PATCH 4/4]percpu_counter: use atomic64 for counter"
In reply to: Li Zefan: "Re: query: [PATCH 2/2] cgroup: Remove call to synchronize_rcu incgroup_attach_task"
Next in thread: Paul Menage: "Re: query: [PATCH 2/2] cgroup: Remove call to synchronize_rcu in cgroup_attach_task"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]