[PATCH 2.6.13-stable] cpuset semaphore double trip fix

From: Paul Jackson
Date: Fri Sep 09 2005 - 19:46:35 EST


Code reading uncovered a potential deadlock on the global cpuset
semaphore, cpuset_sem.

==> This patch is only useful in the 2.6.13-stable series.

(It's harmless, and useless, in the pre 2.6.14 fork)

The pre-2.6.14 fork has already diverged, with an additional patch
that further aggrevated this problem, and a more thorough overhaul
of the cpuset locking, to fix the problems.

All code paths in kernel/cpuset.c (2.6.13 or earlier) that first
grab cpuset_sem and then allocate memory _must_ call the routine
'refresh_mems()', after getting cpuset_sem, before any possible
allocation.

If this refresh_mems() call is not done, then there is a risk that one
of the cpuset_zone_allowed() calls made from within the page allocator
(__alloc_pages) will find that the mems_generation of the current task
doesn't match that of its cpuset, causing it to try to grab cpuset_sem.
Since it already held cpuset_sem, this deadlocks that task, and any
subsequent task wanting cpuset_sem.

==> The code paths leading to the kmalloc in check_for_release(), from
cpuset_exit, cpuset_rmdir and attach_task (for the detached cpuset),
fail to invoke refresh_mems() as required.

The fix is easy enough - add the requisite refresh_mems() call.

Unless someone is rapidly creating, modifying and destroying cpusets,
they are unlikely to have any chance of encountering this deadlock.
And even then, it is apparently difficult to do so.

In the case we got here from cpuset_exit(), we have already torn
down the tasks connection to this cpuset and current->cpuset is NULL.
Don't call refresh_mems() in that case - it oops the kernel.

Signed-off-by: Paul Jackson <pj@xxxxxxx>

Index: linux-2.6.13-mem_exclusive_oom/kernel/cpuset.c
===================================================================
--- linux-2.6.13-mem_exclusive_oom.orig/kernel/cpuset.c
+++ linux-2.6.13-mem_exclusive_oom/kernel/cpuset.c
@@ -458,7 +458,10 @@ static void check_for_release(struct cpu
if (notify_on_release(cs) && atomic_read(&cs->count) == 0 &&
list_empty(&cs->children)) {
char *buf;
+ static void refresh_mems(void);

+ if (current->cpuset)
+ refresh_mems();
buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
if (!buf)
return;

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@xxxxxxx> 1.650.933.1373
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/