Re: PROBLEM: Possible race between xen, md, dm and/or xfs

From: Matt Wilson
Date: Wed Jun 13 2012 - 20:19:14 EST


On Tue, Jun 12, 2012 at 05:11:37AM -0700, Jason Stubbs wrote:
> On 2012-6-12 13:57 , Dave Chinner wrote:
> > Nothing
> > wrong with MD, LVM, or XFS. The problem is either that EBS never
> > completed the IO, or Xen swallowed it and it never made to it to the
> > guest OS. Either way, it does not appear to be a problem in the
> > higher levels of the linux storage stack.
>
> Thanks Dave for looking into this.
>
> I'll be sure to give Amazon ample opportunity to diagnose things from
> there side should the issue occur again and hopefully there won't be
> any more people reporting extraneous issues.

Hi Jason,

If you're able to reproduce this hang, I'm sure that we can get to the
root of the problem quite quickly. Short of that, if you can provide a
running instance that is exhibiting the problem we can do some
live-system debugging. It is much more difficult to determine root
cause and verify fixes without reproduction instructions.

Given the kernel version you reported in your traces, I can at least
rule out one known bug that caused blkfront to wait forever for an IO
to complete:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=dffe2e1

The kernel version you're using using includes the follow-on change to
use fasteoi:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=3588fe2

I'm sorry that I can't be more of more immediate help. If you
encounter the problem again, please contact developer support.

Matt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/