Re: Excessive stall times on ext4 in 3.9-rc2
From: Theodore Ts'o
Date: Thu Apr 11 2013 - 14:36:07 EST
On Thu, Apr 11, 2013 at 06:04:02PM +0100, Mel Gorman wrote:
> > If we're stalling on lock_buffer(), that implies that buffer was being
> > written, and for some reason it was taking a very long time to
> > complete.
> > It might be worthwhile to put a timestamp in struct dm_crypt_io, and
> > record the time when a particular I/O encryption/decryption is getting
> > queued to the kcryptd workqueues, and when they finally squirt out.
> That somewhat assumes that dm_crypt was at fault which is not unreasonable
> but I was skeptical as the workload on dm_crypt was opening a maildir
> and mostly reads.
Hmm... well, I've reviewed all of the places in the ext4 and jbd2
layer where we call lock_buffer(), and with one exception we're not
holding the the bh locked any longer than necessary. There are a few
places where we grab a spinlock or two before we can do what we need
to do and then release the lock'ed buffer head, but the only time we
hold the bh locked for long periods of time is when we submit metadata
blocks for I/O.
 There is one exception in ext4_xattr_release_block() where I
believe we should move the call to unlock_buffer(bh) before the call
to ext4_free_blocks(), since we've already elevanted the bh count and
ext4_free_blocks() does not need to have the bh locked. It's not
related to any of the stalls you've repored, though, as near as I can
tell (none of the stack traces include the ext4 xattr code, and this
would only affect external extended attribute blocks).
Could you code which checks the hold time of lock_buffer(), measuing
from when the lock is successfully grabbed, to see if you can see if I
missed some code path in ext4 or jbd2 where the bh is locked and then
there is some call to some function which needs to block for some
random reason? What I'd suggest is putting a timestamp in buffer_head
structure, which is set by lock_buffer once it is successfully grabbed
the lock, and then in unlock_buffer(), if it is held for more than a
second or some such, to dump out the stack trace.
Because at this point, either I'm missing something or I'm beginning
to suspect that your hard drive (or maybe something the block layer?)
is simply taking a long time to service an I/O request. Putting in
this check should be able to very quickly determine what code path
and/or which subsystem we should be focused upon.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/