Possible race between multi-threaded coredumps and fork

From: Corey Minyard
Date: Thu Jun 03 2004 - 10:14:41 EST


A customer had a field problem with multi-threaded coredumps,
and I think the fix is the exit race (already fixed in 2.6), but I
also noticed a possible race between taking a coredump and fork().

If zap_threads() in fs/exec.c is called while a thread is in do_fork(),
but before the newly created thread is in the thread list, it is possible
to have a running thread during the coredump. This is bad, but not
terribly bad. However, in this situation it is also possible in
__exit_mm() that the final two threads check mm->core_waiters at
the same time and then both decrement mm->core_waiters, causing
it to go negative, and thus causing a BUG() in coredump_wait().

To fix this, I would like to propose the attached patch to 2.6.

Signed-off-by: Corey Minyard <minyard@xxxxxxx>

--- linux.orig/kernel/fork.c 2004-05-21 11:49:07.000000000 -0500
+++ linux/kernel/fork.c 2004-06-03 09:46:00.000000000 -0500
@@ -1011,6 +1011,9 @@
INIT_LIST_HEAD(&p->ptrace_children);
INIT_LIST_HEAD(&p->ptrace_list);

+ /* Need the mmap_sem lock for coredump synchronization. */
+ if (p->mm)
+ down_read(&p->mm->mmap_sem);
/* Need tasklist lock for parent etc handling! */
write_lock_irq(&tasklist_lock);
/*
@@ -1019,6 +1022,7 @@
*/
if (sigismember(&current->pending.signal, SIGKILL)) {
write_unlock_irq(&tasklist_lock);
+ up_read(&p->mm->mmap_sem);
retval = -EINTR;
goto bad_fork_cleanup_namespace;
}
@@ -1040,6 +1044,7 @@
if (current->signal->group_exit) {
spin_unlock(&current->sighand->siglock);
write_unlock_irq(&tasklist_lock);
+ up_read(&p->mm->mmap_sem);
retval = -EAGAIN;
goto bad_fork_cleanup_namespace;
}
@@ -1075,6 +1080,23 @@

nr_threads++;
write_unlock_irq(&tasklist_lock);
+
+ if (p->mm) {
+ if (p->mm->core_waiters) {
+ /*
+ * The thread group I am in is currently
+ * taking a coredump so add the new thread as
+ * a coredump candidate force the new thread
+ * to die immediately.
+ */
+ up_read(&p->mm->mmap_sem);
+ down_write(&p->mm->mmap_sem);
+ force_sig_specific(SIGKILL, p);
+ p->mm->core_waiters++;
+ up_write(&p->mm->mmap_sem);
+ } else
+ up_read(&p->mm->mmap_sem);
+ }
retval = 0;

fork_out: