Re: [Ext2-devel] Re: [CHECKER] crash after fsync causing serious FS corruptions (ext2, 2.6.11)

From: Andreas Dilger
Date: Mon Mar 07 2005 - 19:22:32 EST


On Mar 07, 2005 14:55 -0800, Junfeng Yang wrote:
> > fsync on ext2 only really guarantees that the data has reached
> > the disk, what the disk does it outside the realm of the fs.
> > If the ide drive has write back caching enabled, the data just
> > might only be in cache. If the power is removed right after fsync
> > returns, the drive might not get a chance to actually commit the
> > write to platter.
>
> Thanks for the reply. I tried your patch, and also setting hdparm -W0.
> The warning is still there. This warning and the previous ones I reported
> should be irrelevant to IDE drivers, as FiSC (our FS checker) doesn't
> actually crash the machine but simulates a crash using a ramdisk.
>
> It appears to me that this warning can be triggered by the following steps:
>
> 1. create a file A with several data blocks. fsync(A) to disk
>
> 2. truncate A to a smaller size, causing a few blocks to be freed.
> However, they are only freed in memory. The corresponding changes in
> bitmaps haven't yet hit the disk.
>
> 3. create a file B with several data blocks. ext2 will re-use the freed
> blocks from step 2.
>
> 4. fsync(B). Once fsync returns, crash.

In ext3 this case is handled because the filesystem won't reallocate the
metadata blocks freed from file A before they have been committed to disk.
Also, the operations on file A are guaranteed to complete before or with
operations on file B so fsync(B) will also cause the changes from A to
be flushed to disk at the same time (this is guaranteed to complete before
fsync(B) returns).

I'm not sure how easy it would be to fix this in ext2 without introducing
at least some of the mechanisms from ext3, nor whether there is desire to
do this given the presence of ext3.

> At this moment, the truncate in step 2 hasn't reached the disk yet, so the
> file A on disk still contains pointers to the freed blocks. However, the
> fsync(B) in step 4 flushes B's inode and other metadata to disk. Now we
> end up with a file system where a block is shared by two files.
>
> I'm not sure how the invalid block number warning is triggered.

If B file was larger than 48kB, and you are filling it with e.g. 0xffffffe
then this could overwrite file A's indirect block from the data in file B.

Cheers, Andreas
--
Andreas Dilger
http://members.shaw.ca/adilger/ http://members.shaw.ca/golinux/

Attachment: pgp00000.pgp
Description: PGP signature