Re: limits on raid

From: Bill Davidsen
Date: Thu Jun 21 2007 - 19:03:23 EST


I didn't get a comment on my suggestion for a quick and dirty fix for -assume-clean issues...

Bill Davidsen wrote:
Neil Brown wrote:
On Thursday June 14, david@xxxxxxx wrote:
it's now churning away 'rebuilding' the brand new array.

a few questions/thoughts.

why does it need to do a rebuild when makeing a new array? couldn't it just zero all the drives instead? (or better still just record most of the space as 'unused' and initialize it as it starts useing it?)

Yes, it could zero all the drives first. But that would take the same
length of time (unless p/q generation was very very slow), and you
wouldn't be able to start writing data until it had finished.
You can "dd" /dev/zero onto all drives and then create the array with
--assume-clean if you want to. You could even write a shell script to
do it for you.

Yes, you could record which space is used vs unused, but I really
don't think the complexity is worth it.

How about a simple solution which would get an array on line and still be safe? All it would take is a flag which forced reconstruct writes for RAID-5. You could set it with an option, or automatically if someone puts --assume-clean with --create, leave it in the superblock until the first "repair" runs to completion. And for repair you could make some assumptions about bad parity not being caused by error but just unwritten.

Thought 2: I think the unwritten bit is easier than you think, you only need it on parity blocks for RAID5, not on data blocks. When a write is done, if the bit is set do a reconstruct, write the parity block, and clear the bit. Keeping a bit per data block is madness, and appears to be unnecessary as well.
while I consider zfs to be ~80% hype, one advantage it could have (but I don't know if it has) is that since the filesystem an raid are integrated into one layer they can optimize the case where files are being written onto unallocated space and instead of reading blocks from disk to calculate the parity they could just put zeros in the unallocated space, potentially speeding up the system by reducing the amount of disk I/O.

Certainly. But the raid doesn't need to be tightly integrated
into the filesystem to achieve this. The filesystem need only know
the geometry of the RAID and when it comes to write, it tries to write
full stripes at a time. If that means writing some extra blocks full
of zeros, it can try to do that. This would require a little bit
better communication between filesystem and raid, but not much. If
anyone has a filesystem that they want to be able to talk to raid
better, they need only ask...
is there any way that linux would be able to do this sort of thing? or is it impossible due to the layering preventing the nessasary knowledge from being in the right place?

Linux can do anything we want it to. Interfaces can be changed. All
it takes is a fairly well defined requirement, and the will to make it
happen (and some technical expertise, and lots of time .... and
coffee?).
Well, I gave you two thoughts, one which would be slow until a repair but sounds easy to do, and one which is slightly harder but works better and minimizes performance impact.



--
bill davidsen <davidsen@xxxxxxx>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/