Re: (reiserfs) Re: LVM / Filesystems / High availability

Stephen C. Tweedie (sct@dcs.ed.ac.uk)
Wed, 24 Jun 1998 12:37:32 +0100


Hi,

On Tue, 23 Jun 1998 18:40:06 +0200, Florian Lohoff
<flo@quit.mediaways.net> said:

>> Virtual disks for redundancy or performance are just fine, but
>> when it comes to filesystem sizing, the fs has to be actively involved
>> in any change. Given that, we can actually implement the whole thing in
>> the filesystem.

> I would not like to see a special ext2 implentation for resizing,
> holes, that the ext2 cares on "physical devices" and reorganization of
> blocks.

> The LVM approach with the "virtual block device" makes many things
> much easier. You can keep filesystem code very simple,

I believe things are actually _simpler_ if you keep it in the
filesystem.

> and the LVM code also isnt very complex. The only thing you might take
> care on is the Block Allocation of the LVM which you might do as
> complex and intelligent as you like but a bug in there will NOT cause
> data to get lost or corrupt. The PEs will just not be there where they
> should and it would be just an performance /reliability problem which
> you might fix on the fly without any filesystem interaction. Also
> creating a mirror, raid5, stripe of a simple filesystem on the fly
> would be VERY easy, MUCH easier then doing it in the filesystem level.

Mirroring, raid and striping are all things which should be done in the
device layers, as I said in the first place. However, as soon as we
start talking about online resizing (and it is specifically online
resizing which I am talking about --- offline is an entirely separate
issue), then the filesystem really has got to be involved and
interacting with another component just makes it more complex.

> I think a big approach is to bring down complexity, as this makes
> things more unstable, and more difficult to bugfix, and the LVM is an
> step to bring functionality/features to filesystems without an big
> complexity increase.

_Some_ features, yes. Not dynamic resizing.

>> Miguel's prototype LVM stuff works by letting you mke2fs a new partition
>> and then daisy-chain that new device on to the end of the existing
>> filesystem, at run time, while it is all mounted. Removing such a

> This is still (i am sure) very difficult and not that easy as it
> sounds here.

It IS easy. ext2fs already has multi-layered allocation; allocating
inodes or blocks first has to search for a suitable block group, then
for a free entry in that group. Adding extra code to the block group
search to scan multiple bound filesystems is easy. Adding code to the
inode or block lookup to partition the name spaces over those bound
filesystems is easy. This work is _done_. It works just fine. It's
the management issues which are harder --- working out how to deal with
mounting filesystems; where do you specify the filesystem devices, in
superblock or fstab; what to do if a device is inaccessible, and so on.

> And still - you tight bound to physical devices (read: partitions,
> drives etc)

Yes. That's why I'd really like to know if this is a major problem. As
far as I am concerned, simplicity really does dictate doing this just in
the filesystem. We already get much independence from physical disks by
having things such as raid in the block device layers. Is there any
compelling need to be able to have such fine grained control over
partition allocation as you get from an LVM, given that the ultimate aim
with the filesystem-based solution should be able to let you (a) add a
new partition to the bound filesystem set, and then (b) remove one or
more of the original partitions, to achieve a similar effect? The
biggest downside is that it is likely to be more expensive in terms of
performance to do the removal and remapping from within the fs than from
an LVM.

> I dont think that this complicates the things. We only need some
> interaction between filesystems and devices. Like the filesystem
> telling the device "I would like you to shrink by 4 GB, tell me if you
> are able to do this" "Could you please shrink now by 4 GB, tell me
> when ready" ...

No no no. The shrinking of the filesystem is HARD. We have to
implement it whatever we do. Shrinking a bunch of blocks off of the end
of the filesystem is no easier than shrinking a set from the middle of
the filesystem, so if we have a filesystem composed of bound partitions,
then removing one from the middle doesn't require any LVM magic to make
it appear as if a block device is simply shrinking. Once the shrinking
is working, it is simple just to evict a partition from the bound set,
without having to interact with any other software.

The filesystem-based solution also allows you to do this sort of
management to ANY filesystem, regardless of whether or not you thought
you'd need the feature when you first mounted it.

> BTW: I feel a bit like ext2 is going the Microsoft way of doing
> things. Keep as much as compatibility as possible, and therefor accept
> compromises.

Yes and no. It's entirely deliberate that Linux emphasises stability
and reliability, and we are absolutely not going to drop that as a
priority in ext2fs. The compatibility issue is important, but not
overriding; it comes out as a secondary effect from the overwhelming
priority to not throw away working code and reimplement unless there is
compelling need.

There are some _really_ impressive efforts going on right now, such as
reiserfs, to develop new filesystems for the next generation. Ext2fs
cannot afford to follow if that hurts stability.

The result is that we've got a lot of good stuff coming for ext2fs,
including major performance and reliability improvements such as the
btree and journalling work, but massive overhauls of filesystem code
deserve to be part of the next generation of filesystems, NOT ext2fs.

> This has led Microsoft to installing a 32bit OS into an 16 Bit FAT
> partition. We do not have the need for quick return-of-invest and
> commercial success, so we might choose the BEST TECHNICAL SOLUTION,
> and we dont need to take compromises.

I _am_ looking for the best technical solution here. However, amongst
solutions of equal merit, I will take the simplest every time. For
redundancy/striping, that means doing it in the LVM. For filesystem
size management, I believe that means doing it in the fs.

--Stephen

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu