Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests

From: Stephane Doyon
Date: Tue Jul 11 2006 - 09:41:24 EST

Next message: Thomas Gleixner: "Re: [PATCH] Add try_to_freeze() to rt-test kthreads"
Previous message: Jon Smirl: "Re: tty's use of file_list_lock and file_move"
In reply to: Nathan Scott: "Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, 11 Jul 2006, Nathan Scott wrote:

On Mon, Jul 10, 2006 at 12:46:23PM -0400, Stephane Doyon wrote:
Hi,

Hi Stephane,

A few months back, a fix was introduced for a mutex double unlock warning
related to direct I/O in XFS. I believe that fix has a lock ordering
problem that can cause a deadlock.

The problem was that __blockdev_direct_IO() would unlock the i_mutex in
the READ and DIO_OWN_LOCKING case, and the mutex would be unlocked again
in xfs_read() soon after returning from __generic_file_aio_read(). Because
there are lots of execution paths down from __generic_file_aio_read() that
do not consistently release the i_mutex, the safest fix was deemed to be
to reacquire the i_mutex before returning from __blockdev_direct_IO(). At
this point however, the reader is holding an xfs ilock, and AFAICT the
i_mutex is usually taken BEFORE the xfs ilock.

That is correct, yes. Hmm. Subtle. Painful. Thanks for the detailed
report and your analysis.

We are seeing a deadlock between a process writing and another process
reading the same file, when the reader is using direct I/O. (The writer

Is that a direct writer or a buffered writer?

Whichever, both cases trigger the deadlock.

must apparently be growing the file, using either direct or buffered
I/O.) Something like this, on XFS:
(dd if=/dev/zero of=f bs=128K count=5000 & ) ; sleep 1 ; dd of=/dev/null
if=f iflag=direct bs=128K count=5000

Seen on kernels 2.6.16 and 2.6.17.

The deadlock scenario appears to be this:
-The reader goes into xfs_read(), takes the i_mutex and then an xfs_ilock
XFS_IOLOCK_SHARED, then calls down to __generic_file_aio_read() which
eventually goes down to __blockdev_direct_IO(). In there it drops the
i_mutex.
-The writer goes into xfs_write() and obtains the i_mutex. It then tries
to get an xfs_ilock XFS_ILOCK_EXCL and must wait on it since it's held by
the reader.
-The reader, still in __blockdev_direct_IO(), executes directio_worker()
and then tries to reacquire the i_mutex, and must wait on it because the
writer has it.

And so we have a deadlock.

*nod*. This will require some thought, I'm not sure I like the sound of
your suggested workaround there a whole lot, unfortunately, but maybe it
is all we can do at this stage. Let me ponder further and get back to you

Thank you.

(but if you want to prototype your fix further, that'd be most welcome, of
course).

Sure, well it's not very subtle. The below patch is what I'm using for now. I haven't seen problems with it yet but it hasn't been seriously tested.

I'm assuming that it's OK to keep holding the i_mutex during the call to direct_io_worker(), because in the DIO_LOCKING case, direct_io_worker() is called with the i_mutex held, and XFS touches i_mutex only in xfs_read() and xfs_write(), so opefully nothing will conflict.

Signed-off-by: Stephane Doyon <sdoyon@xxxxxxxxx>
--- linux/fs/direct-io.c.orig 2006-07-11 09:23:20.000000000 -0400
+++ linux/fs/direct-io.c 2006-07-11 09:27:54.000000000 -0400
@@ -1191,7 +1191,6 @@ __blockdev_direct_IO(int rw, struct kioc
loff_t end = offset;
struct dio *dio;
int release_i_mutex = 0;
- int acquire_i_mutex = 0;

if (rw & WRITE)
current->flags |= PF_SYNCWRITE;
@@ -1254,11 +1253,6 @@ __blockdev_direct_IO(int rw, struct kioc
goto out;
}

- if (dio_lock_type == DIO_OWN_LOCKING) {
- mutex_unlock(&inode->i_mutex);
- acquire_i_mutex = 1;
- }
- }

if (dio_lock_type == DIO_LOCKING)
down_read(&inode->i_alloc_sem);
@@ -1282,8 +1276,6 @@ __blockdev_direct_IO(int rw, struct kioc
out:
if (release_i_mutex)
mutex_unlock(&inode->i_mutex);
- else if (acquire_i_mutex)
- mutex_lock(&inode->i_mutex);
if (rw & WRITE)
current->flags &= ~PF_SYNCWRITE;
return retval;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Thomas Gleixner: "Re: [PATCH] Add try_to_freeze() to rt-test kthreads"
Previous message: Jon Smirl: "Re: tty's use of file_list_lock and file_move"
In reply to: Nathan Scott: "Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]