Re: amd64 sata_nv (massive) memory corruption

From: Linas Vepstas
Date: Wed Aug 06 2008 - 17:39:48 EST


2008/8/5 Alan Cox <alan@xxxxxxxxxxxxxxxxxxx>:

>> I'm game. Care to guide me through? So: on every write, this
>> new device mapper module computes a checksum and stores
>> it somewhere. On every read, it computes a checksum and
>> compares to the stored value. Easy enough I guess.
>>
>> Several hard parts:
>> -- where to store the checksums?
>
> That is the million dollar question - plus you can argue it is the fs
> that should do it. There is stuff crawling through the standards world to
> provide a small per block additional info area on disk sectors.

My objection to fs-layer checksums (e.g. in some user-space
file system) is that it doesn't leverage the extra info that RAID
has. If a block is bad, RAID can probably fetch another one
that is good. You can't do this at the file-system level.

I assume I can layer device-mappers anywhere, right?
Layering one *underneath* md-raid would allow it to
reject/discard bad blocks, and then let the raid layer
try to find a good block somewhere else.

I assume that a device mapper can alter the number
of blocks-in to the number of blocks-out; that it doesn't
have to be 1-1. Then for every 10 sectors of data, it
would use 11 sectors of storage, one holding the
checksum. I'm very naive about how the block layer
works, so I don't know what snags there might be.

The downside of this is that the disk wouldn't be
naively readable unless the specific mapper module
was in place -- so one would need a superblock of
some sort indicating the type of checksumming used,
etc. Is there any "standardized" way of managing
superblocks for use by the device mapper? I guess
the encrypting dm has to store meta-information
somewhere, too, specifying what kind of encryption
was used. I'll look at that.

> Yes. If you can figure out where to keep the checksums without ruining
> performance

Heh. Unlikely. The act of checksumming will impact
performance. It should end up similar to the impact
from encryption (maybe not quite as bad), or comparable
to raid-5 (which computes various kinds of parity).

> (and of course if there isn't one lurking in device mapper
> world not yet submitted).

I'm googling, but I don't see anything. However, I now see,
for the first time, pending workd for 2.6.27 for a field in bio
called "blk_integrity". I cannot figure out if this work requires
special-whiz-bang disk drives to be purchased.

Also, it seems to be limited to 8 bytes of checksums per 512
byte block? This is reasonable for checksumming, I guess,
but one could get even fancier and run ECC-type sums, if
one could store, say, an addtional 50 bytes for every 512
bytes. I'm cc'ing Martin Petersen, the developer, for
comments.


--linas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/