Re: [RFC 0/13] extents and 48bit ext3

From: linux
Date: Sun Jun 11 2006 - 04:21:20 EST


> We seem to be lagging behind "the industry" in some areas - handling large
> devices, high bandwidth IO, sophisticated on-disk data structures, advanced
> manageability, etc.

Er... I would like to point out that "sophisticated on-disk data
structures" are, in and of themselves, a Bad Thing. It's only when
they provide some desirable capability that they earn their cost in
implementation difficulty, code size, and bug rate.


ZFS is interesting, and I Really Really Like its reliability guarantees,
but I notice that, due to the append-only nature of its operation,
it's extraordinarily difficult to move data once it's been written.
This makes migrating a file system off of old nasty disks to big new
disks rather annoying. If you know before you add the new drives, you
can physically mirror the old disks and avoid changing block pointers,
but I'd wish for something more flexible.

Because block pointers are physical, and all checksummed, moving a
single block requires rewriting the root block of every snapshot that
contains that block. Now, you can keep an index of "old block X is now
in new location Y" while walking the entire file system until you're
sure that all the old pointers are gone, but it's hard to preallocate
that index, because you also have to know that "old pointer block X
has been recreated at new location Y, but its contents are different;
only the logical content is the same", and there's no obvious way to
bound the number of such forwarding notes that need to be made.

You must have such an index, or you can't preserve sharing while you
migrate the data.

H'm... for sane efficiency, you also need to keep track of all metadata
blocks that have been examined and NOT changed, so when you hit them again
traversing the file system structure DAG, you know that you can stop.
Between the two, this amounts to every metadata block on the file system.
Wow!

Well, at least that gives you an upper limit on the size needed.
One block forwarding entry per data block on the migrated-from disk,
plus one index-forwarding entry (which may be larger, if it contains
the new block checksum) for each index block on the entire file system.

Ouch.

(And, of course, all of this has to be done on a live file system.)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/