Re: data corrupting bug in 2.4.20 ext3, data=journal

From: Stephen C. Tweedie (sct@redhat.com)
Date: Mon Dec 02 2002 - 10:37:22 EST


On Mon, 2002-12-02 at 12:15, Stephen C. Tweedie wrote:

> The problem is that ext3 is expecting that truncate_inode_pages() (and
> hence ext3_flushpage) is only called during a truncate. That's what the
> function is named for, after all, and it's the hint we need to indicate
> that future writeback on the data we're discarding should be disabled
> (so that we don't get old data written on top of new data should the
> block get deallocated.)
>
> But kill_supers() eventually calls truncate_inode_pages() too when we're
> doing the invalidate_inodes().

But we've already called fsync_super() at this point. If that completes
synchronously, then the buffers will already be out of the journal and
queued for writeback when we get here, and clearing BH_JBDDirty won't
have any dire consequences.

Indeed, loading the ext3 module with "do_sync_supers=1" cures the
symptoms.

However, doing every superblock write synchronously is a non-starter, as
that requires a journal commit in the ext3 universe, and so this would
essentially force bdflush to serialise all of its filesystem flushes
(which is a real problem once you've got a significant number of
filesystems mounted.) But the VFS simply doesn't have any clean way of
telling foo_write_super() whether the call needs to be completed
synchronously or asynchronously.

There *is* a totally unclean way, though. kill_super() sets sb->s_root
to NULL before calling its final fsync_super(), and ext3 can easily
check that in ext3_write_super(). It's a nasty, messy dependency, but
should work for 2.4 at least. For 2.5 we probably want to look at
passing an async flag down into the write_super.

The attached patch seems to fix things for me.

Cheers,
 Stephen


--- linux-2.4-ext3merge/fs/ext3/super.c.=K0027=.orig 2002-12-02 15:35:13.000000000 +0000
+++ linux-2.4-ext3merge/fs/ext3/super.c 2002-12-02 15:35:14.000000000 +0000
@@ -1640,7 +1640,12 @@
         sb->s_dirt = 0;
         target = log_start_commit(EXT3_SB(sb)->s_journal, NULL);
 
- if (do_sync_supers) {
+ /*
+ * Tricky --- if we are unmounting, the write really does need
+ * to be synchronous. We can detect that by looking for NULL in
+ * sb->s_root.
+ */
+ if (do_sync_supers || !sb->s_root) {
                 unlock_super(sb);
                 log_wait_commit(EXT3_SB(sb)->s_journal, target);
                 lock_super(sb);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Dec 07 2002 - 22:00:13 EST