Re: [patch] pagecache-2.3.9-H3, bmap & ext2fs cleanup patch

Ingo Molnar (mingo@chiara.csoma.elte.hu)
Mon, 28 Jun 1999 00:33:18 +0200 (CEST)


On Sun, 27 Jun 1999, Linus Torvalds wrote:

> > no, RAID is really a different concept, and it's filesystem independent.
> > multi-device filesystems _might_ be useful but the cleanest way of
> > handling striping and/or redundancy is IMO MD at the block device layer.
> > Caching and block allocation decisions are best done on the filesystem
> > level.
>
> The best argument for having the filesystem know about devices is to have
> filesystems that can "migrate" files from one device to another. It can be
> done other ways too, but basically it is really nice for "intelligent"
> filesystems that do (for example) dynamic load balancing or that have
> multiple levels of caching.

> it's so simple and so "obvious" (the "Mapped" attribute really means that
> both the block number and the device is known, so it makes much more
> conceptual sense to have the mapper do both).

yep. The question was about 'RAID' and most RAID concepts are quite
orthogonal to multidevice mappings. We obviously do not really want to
handle stuff like mirroring or checksumming in the filesystem itself. We
might want to handle other stuff there - hierarchical inode-based storage
is a good example.

Although there are good and highperforming examples of block-based
hierarchical storage implementations as well. I have started coding
something like that, it's in the latest 'alpha' RAID patches, although
it's functional to some degree, it's definitely not usable yet. It's a
virtual block device that allocates blocks on-demand and maps it to
physical devices. This means i can already create a 100G filesystem on a
1G disk no problem - more storage can be added if the need arises. I can
also easily relocate blocks 'under the filesystem', the filesystem does
not notice this. My main goal was/is to enable some sort of dynamic
block-optimizer that relocates blocks based on access patterns. Here are
some basic structures:

/*
* Identifies a block in physical space
*/
typedef struct phys_idx_s {
__u16 phys_nr;
__u32 phys_block;

} PACKED phys_idx_t;

/*
* Identifies a block in logical space
*/
typedef struct log_idx_s {
__u16 log_id;
__u32 log_index;

} PACKED log_idx_t;

/*
* Pointer pointing from the physical space, points to
* the LV owning this block. It also contains various
* statistics about the physical block.
*/
typedef struct pv_pptr_s
{
union {
/* case 1 */
struct {
log_idx_t owner;
log_idx_t predicted;
__u32 last_referenced;
} used;
/* case 2 */
struct {
__u16 log_id;
__u16 __unused1;
__u32 next_free;
__u32 __unused2;
__u32 __unused3;
} free;
} u;
} PACKED pv_pptr_t;

_personally_ i find 'block based hierarchical storage' more sexy because i
think it has much bigger flexibility and feature-range than fs-level
solutions. I can use the above logical device even for MS-DOS partitions.
I can seemlessly relocate blocks to balance load better, i can
(theoretically) 'archive' blocks to tape.

But most importantly, i am recording usage pattern ("access chains") in a
_persistant_ way, it's very similar to CPU branch prediction. It has both
a prediction strenght, an access timestamp, access frequency and access
type statistics part, for every physical block. Lots of things can be
achieved based on that:

- accurate defragmentation
- accurate nonlinear read prefetching
- identification and relocation of access 'hot spots'

but i can also 'unmap' or migrate any part of the physical space without
disturbing 'logical space' (the filesystem), naturally this works just
fine when the filesystem is mounted. This means i can remove old disks
seemlessly and replace them with newer ones. And this all happens with
block granularity.

-- mingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/