idea for filesystem to LVM interface

Matt Grosso (mgrosso@acm.org)
Fri, 26 Jun 1998 19:23:00 +0000


Hi All,

I'm new to kernel hacking, so please, get harsh on this idea before I
waste any more time on it.

The ideas discussed so far for growable/shrinkable filesystem seem to
me to be modeled on the process to kernel brk/sbrk interface. Why not
implement an LVM as something that doles out and takes back relatively
small (4mb to 128mb) chunks of disk space which are individually
linearly addressable. In other words, model the interface more on
malloc/free as opposed to brk/sbrk.

The thing that suggested this idea is that ext2fs is already internally
thinking about its space in terms of clusters so maybe this is a more
natural way to write a filesystem. ( Easy for me to say now ! )

In one version of this approach the intra-kernel interface would look
like this...

chunk_t *chunk_alloc( lvolume_t *lvolume, long blcks );
int chunk_write( chunk_t *chunk, long blck, const char *buf );
int chunk_read( chunk_t *chunk, int blck, char *buf );
int chunk_free( lvolume_t *lvolume, chunk_t *chunk );

where blcks is the requested number of contiguous blocks and all reads
and writes use the same (logical volume dependant) blocksize and the
blck option to read and write specifies the block offset. Those blocks
might not be physically contiguous of course, but they would seem that
way to the filesystem. Storage space allocation in this scheme is driven
by the filesystems' demand but managed by the LVM.

One advantage of this approach is that a clever LVM could prevent much
fragmentation without the filesystem knowing, just by good design of its
chunk allocation algorithm although the filesystem could still become
fragmented within its chunks.

A traditional static-sized filesystem could be trivially ported to work
with this interface by calling chunk_alloc(some_permanent_partition_size)
when it was created and subsequently treating that chunk exactly as it
treated partitions previously, only substituting chunk_write/read for
whatever internal function was being used to write/read blocks before.
This would buy you some quick compatibility, and make testing and debugging
the LVM simpler, although you wouldnt get the benefits of the LVM.

In another implementation of this idea all chunk_t's would be the same
size so chunk_alloc looks like...
chunk_t *chunk_alloc( volume_t *volume, long blcks );
... which would move complexity to the filesystem from the LVM layer and
make it difficult to port existing filesystems.

To add space to a filesystem you would simply add partitions (or logical
block devices encapsulating RAIDn) to the LV on which that filesystem
was mounted. This architecture leaves open the issue of whether the LVM
implements RAIDn, striping, journaling etc.. itself, or uses block devices
that do that for it. Personally I lean towards the latter for reasons that
other people have allready mentioned in the
" Re: (reiserfs) Re: LVM / Filesystems / High availability "
thread , but in any case the filesystem is now insulated from that decision.

In this architecture, a given Logical Volume is created using one or more
block devices, some of which may be real, some of which may be RAIDn.
Block devices may be added or removed from the LV at runtime but a block
device can (obviously) only belong to one LV at a time. The system
call to remove a device would (with an option flag) attempt to relocate
used chunks on that device to the remaining block devices, without the
filesystem needing to know or care!

Initially, requests to read or write from a chunk that was
being moved would simply block until the chunk had been relocated. Later,
you could get more sophisticated about it.

In the initial implementation there could only be one filesystem per
volume group, but ultimately you could support several filesystems per
volume group, which would allow for very efficient use of that volume
groups available space, and for things like filesystem hard and soft disk
quota's etc...

I'd love to hear what people think are the flaws/good points to this.

Matt

ps- I'm only on the linux-kernel-digest, so I would appreciate a cc: to
mgrosso@acm.org. Thanks

pps- Is there really a linux filesystem development mailing list? Can
anyone direct me to that? web page?

-- 
######################################################################
## mgrosso@acm.org		http://www.neterra.com
## senior software engineer 
######################################################################

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu