Re: RFC: from FIBMAP to FIONDEV

Matthew Wilcox (Matthew.Wilcox@genedata.com)
Fri, 11 Jun 1999 13:26:24 +0200


On Fri, Jun 11, 1999 at 12:42:15PM +0200, Werner Almesberger wrote:
> I'm not so sure what to do about RAIDs. In many cases, the boot loader
> wouldn't even know if something is wrong. (And, fortunately, in most
> cases, it shouldn't be affected either.) With more complex RAIDs,
> reconstruction is non-trivial, and in addition you need to know what
> RAID mode you're in, etc.

I'm not sure you do need to know which RAID mode you're in. All you
need to do is know where to load a block from, and what to do with it
when you get it (in the case of RAID-[2345]).

The point you make about `how does the boot loader know that one of the
disks has failed' is well taken though -- how can it know that disc
3 has failed and has been replaced by a new disc with no data on it,
so it should not load any blocks from disc 3?

> Furthermore, there's not enough space in the first stage to do anything
> with RAIDs anyway. So you'll end up with only a partial recovery
> capability.

how much space is left? Recovering from a dud disc should only be a
matter of performing an XOR over a block. It's up to the kernel to
recreate the missing disc, of course.

> So a useful solution would be more along the lines of creating two or
> more similar but not identical installations of LILO and to have some
> means to switch between them. For maximum reliability, you probably
> want your BIOS to decide which disk is okay, and boot that one.

that would work.

> Now all this is beyond FIONDEV, because you either need to go to the
> "real" partitions or you need to have some means to select one of the
> redundant "forks" - not only for FIONDEV, but also for reading and
> writing.

So do you approve of the idea of adding an index to the call to allow the
application to find out about all possible real blocks which contain
this data?

> So in terms of complexity, the best approach is probably to create
> two or more small "normal" partitions on sufficiently different
> drives, and to install LILO, the kernel(s), etc. on each of them.
> Then add whatever is necessary to pick the right one for booting.

I think people would prefer to just make their root partition a raid-1
device and have lilo do the right thing. I don't think it adds too
much complexity to support raid-1. As I indicated, I thought the
raid-[2345] support might be a little over the top :-)

> Note that recovery from disk corruption due to an unclean shutdown
> isn't a likely cause of problems, because you're not supposed to
> boot new kernels that frequently on your high availability system in
> the first place ;-)

Hehe. I once developed on a machine with 8 IDE drives...

-- 
Matthew Wilcox <willy@bofh.ai>
"Windows and MacOS are products, contrived by engineers in the service of
specific companies. Unix, by contrast, is not so much a product as it is a
painstakingly compiled oral history of the hacker subculture." - N Stephenson

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/