Re: (reiserfs) Re: File systems are semantically impoverished compared

Albert D. Cahalan (acahalan@cs.uml.edu)
Sat, 26 Jun 1999 21:20:46 -0400 (EDT)


Alexander Viro writes:
> On Sun, 27 Jun 1999, Hans Reiser wrote:

>> Each layer with its own pointer structure adds additional pointers
>> because in a single tree system the depth of accesses is log N, but in a
>> two tree system the depth is 2 log(N/2), which is much more than log N.
>
> WHAT??? 2 log(sqrt(n)), please (assuming as you apparently did
> that depth is divided evenly).

Hans may be wrong, but you are obviously wrong. log(n) is not very
different from 2 log(sqrt(n)), but there is obviously an extra
layer with an extra cost.

There are two parts to this, path lookup and block lookup.

Normal paths, paths that extend into kernel-supported multi-fork files,
and paths that extend into *.doc files all form the same kind of tree.
When using a *.doc file though, there must be _two_ hash tables for
lookup cache. (kernel and user may not share) Clearly two hash table
lookups will take longer than one, even if the one would need to be
larger. At the extreme, P hash table lookups for P objects will be O(P).

Let's consider a document with P parts, each composed of B blocks.
>From the name lookup, already we have the root of the allocation tree
used for the desired part. When operating directly on a block device
(via kernel support), block lookups are O(log(B)). When operating
within a normal file, access performance is much worse. You start
with the above, but every one of those blocks must also be looked
up with a cost of O(log(B*P)). This is the filesystem overhead.
So the total is O(log(B)+log(B*P)), also known as
O( log(P) + 2*log(B) ). That is much worse than O(log(B)).

Oh, that applies to the path lookup too. Normal kernel-supported
directories are done on the physical media. So the part of a path
lookup that occurs within a userspace document is going to be
slower by a factor of O(log(B*P)).

>> Recovery is necessarily much uglified, more than twice as ugly,
>> and this makes performance worse since performance normally has
>> recovery as the primary crippling factor that prevents the use
>> of various desired algorithms.

I like this comment. With a normal filesystem and normal compound
document files, you really should run fsck.doc on all your files
after a system crash. Kernel-supported compound documents let one
fsck handle everything at once. I expect fewer bugs from the single
tool, since it can fix document and filesystem structure with mostly
the same code. I also expect better performance for the same reasons
as I listed above.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/