[PATCH 3/4] threadgroup: extend threadgroup_lock() to cover exit and exec

From: Tejun Heo
Date: Sun Sep 04 2011 - 14:02:29 EST


From: Tejun Heo <tj@xxxxxxxxxx>

threadgroup_lock() protected only protected against new addition to
the threadgroup, which was inherently somewhat incomplete and
problematic for its only user cgroup. On-going migration could race
against exec and exit leading to interesting problems - the symmetry
between various attach methods, task exiting during method execution,
->exit() racing against attach methods, migrating task switching basic
properties during exec and so on.

This patch extends threadgroup_lock() such that it protects against
all three threadgroup altering operations - fork, exit and exec. For
exit, threadgroup_change_begin/end() calls are added to exit path.
For exec, threadgroup_[un]lock() are updated to also grab and release
cred_guard_mutex.

With this change, threadgroup_lock() guarantees that the target
threadgroup will remain stable - no new task will be added, no new
PF_EXITING will be set and exec won't happen.

The next patch will update cgroup so that it can take full advantage
of this change.

Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Paul Menage <menage@xxxxxxxxxx>
Cc: Li Zefan <lizf@xxxxxxxxxxxxxx>
---
include/linux/sched.h | 32 ++++++++++++++++++++++++++------
kernel/exit.c | 16 ++++++++++++----
2 files changed, 38 insertions(+), 10 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index da74d6f..179cbdc 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -635,11 +635,12 @@ struct signal_struct {
#endif
#ifdef CONFIG_CGROUPS
/*
- * The group_rwsem prevents threads from forking with
- * CLONE_THREAD while held for writing. Use this for fork-sensitive
- * threadgroup-wide operations. It's taken for reading in fork.c in
- * copy_process().
- * Currently only needed write-side by cgroups.
+ * group_rwsem prevents new tasks from entering the threadgroup and
+ * member tasks from exiting. fork and exit paths are protected
+ * with this rwsem using threadgroup_change_begin/end(). Users
+ * which require threadgroup to remain stable should use
+ * threadgroup_[un]lock() which also takes care of exec path.
+ * Currently, cgroup is the only user.
*/
struct rw_semaphore group_rwsem;
#endif
@@ -2364,7 +2365,6 @@ static inline void unlock_task_sighand(struct task_struct *tsk,
spin_unlock_irqrestore(&tsk->sighand->siglock, *flags);
}

-/* See the declaration of group_rwsem in signal_struct. */
#ifdef CONFIG_CGROUPS
static inline void threadgroup_change_begin(struct task_struct *tsk)
{
@@ -2374,13 +2374,33 @@ static inline void threadgroup_change_done(struct task_struct *tsk)
{
up_read(&tsk->signal->group_rwsem);
}
+
+/**
+ * threadgroup_lock - lock threadgroup
+ * @tsk: member task of the threadgroup to lock
+ *
+ * Lock the threadgroup @tsk belongs to. No new task is allowed to enter
+ * and member tasks aren't allowed to exit (as indicated by PF_EXITING) or
+ * perform exec. This is useful for cases where the threadgroup needs to
+ * stay stable across blockable operations.
+ */
static inline void threadgroup_lock(struct task_struct *tsk)
{
+ /* exec uses exit for de-threading, grab cred_guard_mutex first */
+ mutex_lock(&tsk->signal->cred_guard_mutex);
down_write(&tsk->signal->group_rwsem);
}
+
+/**
+ * threadgroup_unlock - unlock threadgroup
+ * @tsk: member task of the threadgroup to unlock
+ *
+ * Reverse threadgroup_lock().
+ */
static inline void threadgroup_unlock(struct task_struct *tsk)
{
up_write(&tsk->signal->group_rwsem);
+ mutex_unlock(&tsk->signal->cred_guard_mutex);
}
#else
static inline void threadgroup_change_begin(struct task_struct *tsk) {}
diff --git a/kernel/exit.c b/kernel/exit.c
index 7b6c4fa..a5b40b3 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -936,6 +936,12 @@ NORET_TYPE void do_exit(long code)
schedule();
}

+ /*
+ * @tsk's threadgroup is going through changes - lock out users
+ * which expect stable threadgroup.
+ */
+ threadgroup_change_begin(tsk);
+
exit_irq_thread();

exit_signals(tsk); /* sets PF_EXITING */
@@ -1018,10 +1024,6 @@ NORET_TYPE void do_exit(long code)
kfree(current->pi_state_cache);
#endif
/*
- * Make sure we are holding no locks:
- */
- debug_check_no_locks_held(tsk);
- /*
* We can do this unlocked here. The futex code uses this flag
* just to verify whether the pi state cleanup has been done
* or not. In the worst case it loops once more.
@@ -1039,6 +1041,12 @@ NORET_TYPE void do_exit(long code)
preempt_disable();
exit_rcu();

+ /*
+ * Release threadgroup and make sure we are holding no locks.
+ */
+ threadgroup_change_done(tsk);
+ debug_check_no_locks_held(tsk);
+
/* this task is now dead and freezer should ignore it */
current->flags |= PF_NOFREEZE;

--
1.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/