[RFC][PATCH] Fix hang in posix_locks_deadlock()

From: George G. Davis
Date: Wed Oct 17 2007 - 14:52:17 EST


From: Armin Kuster <AKuster@xxxxxxxxxx>

We have observed hangs in posix_locks_deadlock() when multiple threads
use fcntl(2) F_SETLKW to synchronize file accesses. The problem appears
to be due to an error in the implementation of posix_locks_deadlock() in
which "goto next_task" is used to break out of the list_for_each_entry()
file_lock search after which the posix_same_owner(caller_fl, block_fl)
test may evaluate to false and the list_for_each_entry() loop restarts
all over again. This in turn leads to a hang where posix_locks_deadlock()
never returns. The workaround is to change the posix_same_owner()
test within the list_for_each_entry() loop to directly compare caller_fl
against current fl entry.


Signed-off-by: Armin Kuster <AKuster@xxxxxxxxxx>
Signed-off-by: George G. Davis <gdavis@xxxxxxxxxx>

---
Not sure if this is the correct fix but it does resolve the hangs we're
observing in posix_locks_deadlock(). Comments greatly appreciated...

diff --git a/fs/locks.c b/fs/locks.c
index 7f9a3ea..7669a0c 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -702,14 +702,11 @@ static int posix_locks_deadlock(struct file_lock *caller_fl,
{
struct file_lock *fl;

-next_task:
if (posix_same_owner(caller_fl, block_fl))
return 1;
list_for_each_entry(fl, &blocked_list, fl_link) {
- if (posix_same_owner(fl, block_fl)) {
- fl = fl->fl_next;
- block_fl = fl;
- goto next_task;
+ if (posix_same_owner(fl, caller_fl)) {
+ return 1;
}
}
return 0;

--
Regards,
George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/