Re: Reproducable OOPS with MD RAID-5 on 2.6.0-test11 - with XFS

From: Neil Brown
Date: Mon Dec 01 2003 - 18:07:31 EST

Next message: Jean Tourrilhes: "Re: weird irda problem with 2.6 kernel and ericsson phone"
Previous message: Martin J. Bligh: "Re: hash table sizes"
In reply to: Linus Torvalds: "Re: Reproducable OOPS with MD RAID-5 on 2.6.0-test11"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Monday December 1, axboe@xxxxxxx wrote:
> On Mon, Dec 01 2003, Kevin P. Fleming wrote:
> > I've got a new system here with six SATA disks set up in a RAID-5 array
> > (no partition tables, using the whole disks). I then used LVM2 tools to
> > make the RAID array a physical volume, created a logical volume and
> > formatted that volume with an XFS filesystem.
> >
> > Mounting the filesystem and copying over the 2.6 kernel source tree
> > produces this OOPS (and is pretty reproducable):
> >
> > kernel BUG at fs/bio.c:177!
>
> It's doing a put on an already freed bio, that's really bad.
>

That makes 2 bug reports that seem to suggest that raid5 is calling
bi_end_io twice on the one bio.

The other one was from Eric Jensen <ej@xxxxxxxxxxxx>
with Subject: PROBLEM: 2.6.0-test10 BUG/panic in mpage_end_io_read
on 26 Nov 2003

Both involve xfs and raid5.
I, of course, am tempted to blame xfs.....

In this case, I don't think that raid5 calling bi_end_io twice would
cause the problem as the bi_end_io that raid5 calls is clone_end_io,
and that has an atomic_t to make sure it only calls it's bi_end_io
(bio_end_io_pagebuf) once, even if it were called multiple times itself.

So I'm wondering if xfs might be doing something funny after
submitting the request to raid5... though I don't find that convincing
either.

In this reports, the IO seems to have been request from the
pagebuf stuff (fs/xfs/pagebuf/page_buf.c). In the other one it
is coming from mpage, presumably from inside xfs/linux/xfs_aops.c
These are very different code paths and are unlikely to share a bug
like this.

Which does tend to point the finger back at raid5. :-(

I'd love to see some more reports of similar bugs, in the hope that
they might shed some more light.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Jean Tourrilhes: "Re: weird irda problem with 2.6 kernel and ericsson phone"
Previous message: Martin J. Bligh: "Re: hash table sizes"
In reply to: Linus Torvalds: "Re: Reproducable OOPS with MD RAID-5 on 2.6.0-test11"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]