Re: [reiserfs-dev] Re: Ext2 directory index: ALS paper and benchmarks

From: Hans Reiser (
Date: Fri Dec 07 2001 - 16:01:20 EST

Ragnar Kj°rstad wrote:

>On Fri, Dec 07, 2001 at 04:51:33PM +0100, Daniel Phillips wrote:
>>I did try R5 in htree, and at least a dozen other hashes. R5 was the worst
>>of the bunch, in terms of uniformity of distribution, and caused a measurable
>>slowdown in Htree performance. (Not an order of magnitude, mind you,
>>something closer to 15%.)
>That sounds reasonable.

You are more dependent on hash uniformity than we are. We have a
balancing algorithm that manages space, you use hashing to manage your
space. It is a weakness of your approach (which is not to say your
approach is a bad one).

>>An alternative way of looking at this is, rather than R5 causing an order of
>>magnitude improvement for certain cases, something else is causing an order
>>of magnitude slowdown for common cases. I'd suggest attempting to root that
>In the cases I've studied more closely (e.g. maildir cases) the problem
>with reiserfs and e.g. the tea hash is that there is no common ordering
>between directory entries, stat-data and file-data.
>When new files are created in a directory, the file-data tend to be
>allocated somewhere after the last allocated file in the directory. The
>ordering of the directory-entry and the stat-data (hmm, both?) are

no, actually this is a problem for v3. stat data are time of creation
ordered (very roughly speaking)
and directory entries are hash ordered, meaning that ls -l suffers a
major performance penalty.

This might well affect our performance vs. htree, I don't know where
Daniel puts his stat data.

This matter is receiving attention in V4, and Nikita and I need to have
a seminar on it next week.

>however dependent on the hash. So, with something like the tea hash the
>new file will be inserted in the middle of the directory.
>In addition to the random lookup type reads, there are three other common
>scenarios for reading the files:
>1 Reading them in the same order they were created
>The cache will probably not be 100% effective on the
>directory/stat-data, because it's beeing accessed in a random-like
>order. Read-ahead for the file-data on the other hand will be effective.
>2 Reading the files in filename-order
>Some applications (say, ls -l) may do this, but I doubt it's a very
>common accesspattern. Cache-hit for directory-data will be poor, and
>cache-hit for file-data will be poor unless the files were created in
>the same order.
>3 Reading the files in readdir() order.
>This is what I think is the most common access-pattern. I would expect a
>lot of programs (e.g. mail clients using maildir) to read the directory
>and for every filename stat the file and read the data. This will be in
>optimal order for directory-caching, but more importantly it will be
>random-order like access for the file-data.
>I think scenario nr 3 is the one that matters, and I think it is this
>scenario that makes r5 faster than tea in real-life applications on
>reiserfs. (allthough most numbers available are from benchmarks and not
>real life applications).
>The directory content is likely to all fit in cache even for a fairly
>large directory, so cache-misses are not that much of a problem. The
>file-data itself however, will suffer if read-ahead can't start reading
>the next file from disk while the first one is beeing processed.
>I'm counting on Hans or someone else from the reiserfs team to correct
>me if I'm wrong.

Users who want to speedup reiserfs V3 read/stat performance can do so by
copying directories after creating them, and this way readdir order
equals stat data order. Sad, I know. Only a really fanatic sysadmin is
going to create his reiserfs installs using a master image that has
experienced a cp, but it will make things significantly faster if he
does. Green, add this to the FAQ.

We need to fix this, it is a missed opportunity for higher performance.
 V4 I hope.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

This archive was generated by hypermail 2b29 : Fri Dec 07 2001 - 21:00:40 EST