Re: 2.6.19 file content corruption on ext3

From: Andrew Morton
Date: Mon Dec 18 2006 - 20:55:37 EST


On Tue, 19 Dec 2006 03:44:51 +0200
Andrei Popa <andrei.popa@xxxxxxxx> wrote:

> On Mon, 2006-12-18 at 17:21 -0800, Andrew Morton wrote:
> > On Mon, 18 Dec 2006 16:57:30 -0800 (PST)
> > Linus Torvalds <torvalds@xxxxxxxx> wrote:
> >
> > > What happens if you only ifdef out that single thing?
> > >
> > > The actual page-cleaning functions make sure to only clear the TAG_DIRTY
> > > bit _after_ the page has been marked for writeback. Is there some ordering
> > > constraint there, perhaps?
> > >
> > > I'm really reaching here. I'm trying to see the pattern, and I'm not
> > > seeing it. I'm asking you to test things just to get more of a feel for
> > > what triggers the failure, than because I actually have any kind of idea
> > > of what the heck is going on.
> > >
> > > Andrew, Nick, Hugh - any ideas?
> >
> > If all of test_clear_page_dirty() has been commented out then the page will
> > never become clean hence will never fall out of pagecache, so unless Andrei
> > is doing a reboot before checking for corruption, perhaps the underlying
> > data on-disk is incorrect, but we can't see it.
>
> if I do a sync and echo 1 > /proc/sys/vm/drop_caches

OK, that works.

> does the reboot is
> still necesary ?

It might be necessary to reboot in this case - if we're leaving the
pagecache dirty, writing to drop_caches won't remove it. And you probably
won't be able to get a clean reboot either.

> >
> > Andrei, how _are_ you running this test? What's the exact sequence of steps?
> >
> > In particular, are you doing anything which would cause the corrupted file
> > to be evicted from memory, thus forcing a read from disk? Such as
> > unmounting and then remounting the filesystem?
>
> I boot linux, I start rtorrent and start the download, while it's
> downloading I start evolution and i check my mail(my mbox is very large,
> several hundered megabytes), I close evolution(I use evolution just to
> have another application witch uses the filesystem and the memory), I
> start evolution again. I start firefox. The download is complete.
> Rtorrent says if the hash is good or not. I do a "unrar t qwe.rar" to
> test that all 84 downloaded rar files are ok and see the result.
>
> >
> > The point of my question is to check that the data is really incorrect
> > on-disk, or whether it is incorrect in pagecache.
> >
> > Also, it'd be useful if you could determine whether the bug appears with
> > the ext2 filesystem: do s/ext3/ext2/ in /etc/fstab, or boot with
> > rootfstype=ext2 if it's the root filesystem.
>
> I will test.

ok, thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/