Re: the " 'official' point of view" expressed by kernelnewbies.orgregarding reiser4 inclusion

From: David Masover
Date: Tue Aug 01 2006 - 13:38:31 EST


Alan Cox wrote:
Ar Maw, 2006-08-01 am 11:44 -0500, ysgrifennodd David Masover:
Yikes. Undetected.

Wait, what? Disks, at least, would be protected by RAID. Are you telling me RAID won't detect such an error?

Yes.

RAID deals with the case where a device fails. RAID 1 with 2 disks can
in theory detect an internal inconsistency but cannot fix it.

Still, if it does that, that should be enough. The scary part wasn't that there's an internal inconsistency, but that you wouldn't know.

And it can fix it if you can figure out which disk went. Or give it 3 disks and it should be entirely automatic -- admin gets paged, admin hotswaps in a new disk, done.

we're OK with that, so long as our filesystems are robust enough. If it's an _undetected_ error, doesn't that cause way more problems (impossible problems) than FS corruption? Ok, your FS is fine -- but now your bank database shows $1k less on random accounts -- is that ok?

Not really no. Your bank is probably using a machine (hopefully using a
machine) with ECC memory, ECC cache and the like. The UDMA and SATA
storage subsystems use CRC checksums between the controller and the
device. SCSI uses various similar systems - some older ones just use a
parity bit so have only a 50/50 chance of noticing a bit error.

Similarly the media itself is recorded with a lot of FEC (forward error
correction) so will spot most changes.

Unfortunately when you throw this lot together with astronomical amounts
of data you get burned now and then, especially as most systems are not
using ECC ram, do not have ECC on the CPU registers and may not even
have ECC on the caches in the disks.

It seems like this is the place to fix it, not the software. If the software can fix it easily, great. But I'd much rather rely on the hardware looking after itself, because when hardware goes bad, all bets are off.

Specifically, it seems like you do mention lots of hardware solutions, that just aren't always used. It seems like storage itself is getting cheap enough that it's time to step back a year or two in Moore's Law to get the reliability.

The sort of changes this needs hit the block layer and ever fs.
Seems it would need to hit every application also...

Depending how far you propogate it. Someone people working with huge
data sets already write and check user level CRC values for this reason
(in fact bitkeeper does it for one example). It should be relatively
cheap to get much of that benefit without doing application to
application just as TCP gets most of its benefit without going app to
app.

And yet, if you can do that, I'd suspect you can, should, must do it at a lower level than the FS. Again, FS robustness is good, but if the disk itself is going, what good is having your directory (mostly) intact if the files themselves have random corruptions?

If you can't trust the disk, you need more than just an FS which can mostly survive hardware failure. You also need the FS itself (or maybe the block layer) to support bad block relocation and all that good stuff, or you need your apps designed to do that job by themselves.

It just doesn't make sense to me to do this at the FS level. You mention TCP -- ok, but if TCP is doing its job, I shouldn't also need to implement checksums and other robustness at the protocol layer (http, ftp, ssh), should I? Because in this analogy, it looks like TCP is the "block layer" and a protocol is the "fs".

As I understand it, TCP only lets the protocol/application know when something's seriously FUBARed and it has to drop the connection. Similarly, the FS (and the apps) shouldn't have to know about hardware problems until it really can't do anything about it anymore, at which point the right thing to do is for the FS and apps to go "oh shit" and drop what they're doing, and the admin replaces hardware and restores from backup. Or brings a backup server online, or...



I guess my main point was that _undetected_ problems are serious, but if you can detect them, and you have at least a bit of redundancy, you should be good. For instance, if your RAID reports errors that it can't fix, you bring that server down and let the backup server run.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/