Re: kernel BUG at fs/ext/super.c:428

From: Pallipadi, Venkatesh
Date: Wed Jan 14 2009 - 14:39:00 EST


On Wed, Jan 14, 2009 at 11:29:37AM -0800, Peter Zijlstra wrote:
> On Wed, 2009-01-14 at 11:16 -0800, Pallipadi, Venkatesh wrote:
> > On Tue, Jan 13, 2009 at 08:40:59PM -0800, Theodore Tso wrote:
> > > On Wed, Jan 14, 2009 at 02:48:13AM +0000, Arjan van de Ven wrote:
> > > >> Well, Arjan's commit, efaee192: "async: make the final inode deletion
> > > >> an asynchronous event", does change how inodes get deleted, and this
> > > >> looks like a race where an inode is getting deleted during the umount.
> > > >>
> > > >> So I would try reverting commit efaee192 and see if it fixes things
> > > >> before starting a full bisect...
> > > >
> > > > the commit is already reverted before rc1
> > > >
> > >
> > > Ah, right. I see, the async infrastructure is still in fs/super.c,
> > > but the actual code to insert deleted inodes onto the s_async_list was
> > > removed in commit b32714b. Sorry, that confused me.
> > >
> > > OK, so assuming that Venkatesh was using something post-rc1, I can't
> > > suggest anything other than a full bisect. Sorry....
> > >
> >
> > Below is the result of full bisect
> >
> > 38d47c1b7075bd7ec3881141bb3629da58f88dab is first bad commit
> > commit 38d47c1b7075bd7ec3881141bb3629da58f88dab
> > Author: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> > Date: Fri Sep 26 19:32:20 2008 +0200
> >
> > futex: rely on get_user_pages() for shared futexes
> >
> > On the way of getting rid of the mmap_sem requirement for shared futexes,
> > start by relying on get_user_pages().
> >
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> > Acked-by: Nick Piggin <nickpiggin@xxxxxxxxxxxx>
> > Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
> >
> > :040000 040000 029e79e0a7421438c2a7437dd210a1acf40b6c29 b581716762c1952c0f515fac642690514e7224b7 M include
> > :040000 040000 5f604b60974dbb9b0ac1a0910234f28c43a5e691 10c576cabe7eae661501ec38861a0f7488d5353b M kernel
>
> However does a futex change make ext3 crap its pants?
>
> Is there anything more to it than start the machine, and reboot?

Just system startup and reboot is enough to reproduce the problem. And 100%
reproducible. So, does seem to be any timing involved either.

Thanks,
Venki
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/