Re: [CHECKER] e2fsck writes out blocks out of order, causing root dir to be corrupted (ext3, linux 2.4.19, e2fsprogs 1.34)

From: Valdis . Kletnieks
Date: Tue May 11 2004 - 22:12:21 EST

On Tue, 11 May 2004 22:45:33 EDT, "Richard B. Johnson" said:

> Question? Is fsck specified to be able to be crashed? I'm not
> sure you could ever make a repair-tool that could do that unless
> there was some "guaranteed to save device" on an independent power
> source during the repair. Fsck can't commit partial fixes of some
> stuff because it would leave the file-system in an unrecoverable
> state. It needs to complete.

On the flip side, if you poke through the code in fs/ext2/, you'll find that
a very large percentage of the code is not actually doing directly productive
work, but merely making sure that things always go to disk in the right
order, and double checking that things haven't been changed out from under
us, and the like.

I suspect this bug is merely a special case of "your filesystem can get scrogged
if something's doing caching behind your back" - the same sort of issues
that prompted recent "flush the IDE cache on shutdown" fixes, and the
well-known issues with using journalling file systems on a file-backed loopback

Having said that, I admit being surprised that their demonstration test case
is *that* simple - that's a truly small number of I/Os to get it into a repeatably
corruptible state. I'm sure many of us have a mental image of these class of
failures as being heisenbugs, dependent on the cache contents.

> Judging by the number of Stanford people being copied, I would
> guess that this is a troll-probe?

Hardly - the class of errors is one that does (or should) concern the kernel
community - and I don't consider identifying a "your filesystem *will* be toast
if you get into this repeatable scenario" a troll. At the very least, we can
consider what additional hardening we can do to either the kernel or userspace
to make sure that we don't re-order the blocks - note the key phrase here:

"Neither of these pay attention to the journaling constraints of EXT3 and JBD."

There seems to have been a thread about write barriers for IDE drives back in
Feb 2003, to address exactly this issue. Has the current 2.4/2.6 tree had any
significant real improvement regarding this since the admittedly old 2.4.19
kernel that the Stanford crew was testing? I can't remember if that thread
resulted in any committed code actually used by the filesystems....

Attachment: pgp00000.pgp
Description: PGP signature