Re: Volume Managers in Linux

Greg Mildenhall (greg@networx.net.au)
Wed, 4 Nov 1998 13:13:03 +0800 (WST)


On Tue, 3 Nov 1998, James Fidell wrote:
> Quoting Theodore Y. Ts'o (tytso@MIT.EDU):
> > I've never claimed that the ext2 is the best way to do RAID; I think MD
> > is the way to do that. However, allowing ext2 to be able to support
> > filesystems which span multiple block devices is a good thing to do, and
> > a cleaner way of supporting multivolume support.
> What "feels wrong" about this to me is that all fs implementations are
> then required to implement multiple device spanning, or they can't be
> used on spanning partitions at all.
Yes. Far better to my mind would be a single block device that acts just
as md does now, but then add a mechanism for ext2fs, or any other fs
driver, to gather the information it needs to behave optimally, should it
choose to.
For example[1]: the ext2 driver will always try to put a data block as
close to it's inode as possible. If block x is holds the inode, and block
x+1 is free, the driver will plonk data there more often than not. But if
block x and block x+1 are on different physical devices, much of the
advantage in this is lost. If the filesystem driver has a way of knowing
that there is a device boundary after block x, it will perhaps look at x-1
instead.
All of the information ext2 needs to optimise such things ought be
available to it _without_ it having to span multiple devices. That way,
Bill can come in in the middle of the night and reformat it to NTFS, and
he won't be confused by "NTFS driver is still in beta and does not support
multidevice spanning" errors, while the ext2 driver is as optimal as it
would be otherwise. And because we have changed only the driver, not the
filesystem layout, we can boot off our stock debian rescue floppy and read
the fs when we trash our bootsector with the new ext2 driver code :)

*: This is a vast over-simplification. To the point of being wrong.
If you are a trainee fs-hacker, please forget everything written here.

> Conceptually it seems simpler to have the virtual layer which understands
> how to span multiple partitions, but which looks like a block device from
> the "user" view, thus allowing any filesystem type to be used upon it, be
> that ext2, reiserfs, ufs or something even better that we haven't even
> thought of yet.
Hey, I just said that!

> In this respect, MD seems like it's heading in the right
> direction, though I believe it needs more support for mirroring and
> striping (ie striped mirrors, or mirrors of stripes ?),
And it would be very nice if it can do this orthogonally to filesystem
and fs-driver implementation where possible.

> better management tools
Now we get to the other stuff - what happens when we change things.
When we change our set of physical devices on-the-fly, the fs layer
should not notice. Since it is presented with one big contiguous device,
all it should need to know is how to respond to "just added some nice new
blocks for you onto the end of the device" or "get your lazy ass off the
last x amount of blocks, I want those". If it's not a contiguous space at
the end of the device that we want to deal with, the MD layer ought to
deal with shuffling things around, and let ext2 think it's as simple as
the above. That gives us a system that works, with minimal change to fs
drivers (none, if we don't want the FS driver to make use of resizabe
block devices) - remember, changes in block device drivers only have to be
done _once_.
Of course, for performance, we might want to tweak individual fs drivers.
In the earlier example, ext2fs will need to be told to rethink it's
assumptions about the physical device boundaries, so that new blocks it
adds will be optimal. If the old decisions have been rendered sub-optimal,
then ext2 (or an offline defragmenter) might want to reorder things a bit
- that's ok, we can add that in later, we don't have to worry about it for
core functionality. That's stuff we'd have to do anyway
Because the fs driver can get the relevant info, it should be able to sort
itself out in it's own good time, independently of the block devices.

> and full error recovery.
Unless you're talking about on-the-fly RAID error recovery (which has _no_
place in the fs layer :) this is stuff that can happen in userspace. (new
improved fsck-ext[2,3,4,5,6..]) adding journaling extensions into ext2,
tweaking the RAID code for robustness, or whatever - these things should
be able to be confined to either the block device, or the fs driver.

I guess what I'm trying to say is that we have a long tradition of
managing to keep things as orthogonal and independent as possible - a good
design methodology in any case, but an essential one for such a
distributed development process - and there would have to be _really_ good
reasons to break that tradition here. I can't see any, but I know there
are those out there far more qualified than I.

-Greg Mildenhall

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/