Re: [PATCH] Fix deadlock in RPC scheduling code.

From: Trond Myklebust
Date: Thu Mar 09 2006 - 09:38:14 EST


On Thu, 2006-03-09 at 11:35 +0100, Aurelien Degremont wrote:
> Hello,
>
> Here is a small patch which fixes a deadlock issue in RPC scheduling
> code (rpc_wake_up_task() is mainly concerned). This problem happens if a
> RPC task, waiting for the response of a sync request, is woken up by a
> signal, and, in the same time, the kernel tries to wake up some RPC
> tasks. If this is done, a deadlock could occurs due to an error in RPC
> workqueue lock management.
>
> This error was added by linux-2.6.8.1-49-rpc_workqueue.patch (included
> in 2.6.11). This code was not changed since. The issue was detected on
> linux-2.6.12.
>
>
> **DESCRIPTION**
>
> This deadlock is due to the wrong management of two locks: queue->lock
> and the bit RPC_TASK_WAKEUP in task->tk_runstate.
> When dealing with RPC workqueues, the code must take care of locking the
> associated queue lock before doing any modifications on it or its tasks.
> And it does it... except for one function. Nested locks should always be
> taken in the same order.
>
> As a consequence, in some circumstances (in fact, I noticed it several
> times), this code deadlocks. Moreover, when this code is called, by
> example by nfs_file_flush(), the lock_kernel() is held and so all the
> system hangs.
>
>
> **PATCH**
>
> To fix the problem, we only need to invert the lock hierarchy in
> rpc_wake_up_task() and taking care of checking the task is really queued
> before trying to grab the lock.

No you can't. You have absolutely _no_ guarantee that
task->u.tk_wait.rpc_waitq is valid here.

The real fix is the one I posted in response to this thread last week.
See

http://client.linux-nfs.org/Linux-2.6.x/2.6.16-rc5/linux-2.6.16-89-fix_race_in_rpc_wakeup.dif

Cheers,
Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/