Re: [patch] ext2/3: document conditions when reliable operation ispossible

From: david
Date: Tue Aug 25 2009 - 19:47:25 EST


On Wed, 26 Aug 2009, Pavel Machek wrote:

Basically, any file system (Linux, windows, OSX, etc) that writes into
the page cache will lose data when you hot unplug its storage. End of
story, don't do it!

No, not ext3 on SATA disk with barriers on and proper use of
fsync(). I actually tested that.

Yes, I should be able to hotunplug SATA drives and expect the data
that was fsync-ed to be there.

You can and will lose data (even after fsync) with any type of storage at
some rate. What you are missing here is that data loss needs to be
measured in hard numbers - say percentage of installed boxes that have
config X that lose data.

I'm talking "by design" here.

I will lose data even on SATA drive that is properly powered on if I
wait 5 years.

I can promise you that hot unplugging and replugging a S-ATA drive will
also lose you data if you are actively writing to it (ext2, 3, whatever).

I can promise you that running S-ATA drive will also lose you data,
even if you are not actively writing to it. Just wait 10 years; so
what is your point?

But ext3 is _designed_ to preserve fsynced data on SATA drive, while
it is _not_ designed to preserve fsynced data on MD RAID5.

substatute 'degraded MD RAID 5' for 'MD RAID 5' and you have a point here. although the language you are using is pretty harsh. you make it sound like this is a problem with ext3 when the filesystem has nothing to do with it. the problem is that a degraded raid 5 array can be corrupted by an additional failure.

Do you really think that's not a difference?

I don't object to making that general statement - "Don't hot unplug a
device with an active file system or actively used raw device" - but
would object to the overly general statement about ext3 not working on
flash, RAID5 not working, etc...

You can object any way you want, but running ext3 on flash or MD RAID5
is stupid:

* ext2 would be faster

* ext2 would provide better protection against powerfail.

Not true in the slightest, you continue to ignore the ext2/3/4 developers
telling you that it will lose data.

I know I will lose data. Both ext2 and ext3 will lose data on
flashdisk. (That's what I'm trying to document). But... what is the
benefit of ext3 journaling on MD RAID5? (On flash, ext3 at least
protects you against kernel panic. MD RAID5 is in software, so... that
additional protection is just not there).

Faster recovery time on any normal kernel crash or power outage. Data
loss would be equivalent with or without the journal.

No, because you'll actually repair the ext2 with fsck after the kernel
crash or power outage. Data loss will not be equivalent; in particular
you'll not lose data writen _after_ power outage to ext2.

by the way, while you are thinking about failures that can happen from a failed write corrupting additional blocks, think about the nightmare that can happen if those blocks are in the journal.

the 'repair' of ext2 by a fsck is actually much less than you are thinking that it is.

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/