Re: amd64 sata_nv (massive) memory corruption
From: John Stoffel
Date: Thu Aug 07 2008 - 14:53:51 EST
>>>>> "Martin" == Martin K Petersen <martin.petersen@xxxxxxxxxx> writes:
>>>>> "Linas" == Linas Vepstas <linasvepstas@xxxxxxxxx> writes:
Linas> My problem is that the corruption I see is "silent": so
Linas> redundancy is useless, as I cannot distinguish good blocks from
Linas> bad. I'm running RAID, one of the two disks returns bad data.
Linas> Without checksums, I can't tell which version of a block is the
Linas> good one.
Martin> But btrfs can.
Maybe. I'd not trust btrfs even now because the on-disk format is
going to change yet again from the currently released version. I'm
personally interested in it, but not quite enough to use it. :]
Linas> There is also in interesting possibility that offers a middle
Linas> ground between raw performance and safety: instead of verifying
Linas> checksums on *every* read access, it could be enough to verify
Linas> only every so often -- say, only one out of every 10 reads, or
Linas> maybe triggered by a cron job in the middle of the night: turn
Linas> on verification, touch a bunch of files for an hour or two,
Linas> turn off verification before 6AM.
If you're reading the file off disk, it doesn't cost anything to
verify it then, esp if the checksum is either in the metadata or next
to the blocks themselves.
It's corruption in files which aren't read which turns into a
problem.
Martin> All evidence suggests that scrubbing is a good way to keep
Martin> your data healthy.
Yup. And mirroring anything you think is important. Disk is cheap,
mirroring is good.
Heck, I'd pay good money for a SATA disk which mirrored inside itself
or which joined two seperate spindle/head assemblies into one and did
all the error correction at a low level.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/