Re: Oops on 2.6.9-ac16: xfs, dm and md may be involved

From: Nathan Scott
Date: Wed Dec 22 2004 - 18:13:43 EST


On Wed, Dec 22, 2004 at 08:52:03PM +0100, Joerg Sommrey wrote:
> On Wed, Dec 22, 2004 at 06:26:06PM +0000, Christoph Hellwig wrote:
> > On Tue, Dec 21, 2004 at 07:57:54PM +0100, Joerg Sommrey wrote:
> > > Hello,
> > >
> > > last night my box died with a kernel oops. There was a backup
> > > running at that time. The setup:
> > > - 2 SATA disks + 1 SCSI disk
> > > - SATA partitions build up md-raid-arrays (level 0 and 1)
> > > - md-raid-devices and SCSI partitions are physical volumes for dm
> > > - dm logical volumes are used for xfs filesystems
> > > - backup is done on dm-snapshots of those filesystems
> >
> > Given the strange backtrace and this enormous stack of drivers I bet
> > you're seeing a stack overflow.

Hmm, I'm not real sure of that Christoph - this was inside a
kernel thread (xfsbufd) where there is almost nothing on the
stack at the point we dove into driver land. Looked like a
genuine bug to me. There were plenty of calls on the trace,
but I think several of those were badly guessed by the stack
dump code. And a couple of registers having a memory poison
pattern looked a bit suspect.

> Does this mean that this kind of stuff just doesn't work? I was running
> a 4K-stack kernel with this "stack of drivers" for quiet some time without
> problems. The problems started around 2.6.9-pre-something. Converting
> to 8K-stacks didn't help. Is this only xfs related?

Certainly wasn't XFS using stack in the initial oops, perhaps
the lower layers, but I'm a bit sceptical. Almost certainly
this is a device mapper snapshot problem, the DM folks should
be able to analyse it further.

cheers.

--
Nathan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/