Re: Filesystem optimization..

Michael O'Reilly (michael@metal.iinet.net.au)
30 Dec 1997 09:08:22 +0800


ebiederm+eric@npwt.net (Eric W. Biederman) writes:
> MR> Even in this, there's still a win from not needing to allocate a fixed
> MR> amount of inodes.
>
> And again see btree based filesystems. There is reiserfs in the
> works, as well as my own shmfs filesystem (though because it has
> different prioirties, it doesn't yet keep all inodes in the btree) but
> basically with such a beast it is possible, to keep inodes in the
> directory tree.

I've had a number of people point these out, but there's not a
terribly good option for me. I need a stable filesystem, so the
smallest possible change for the largest gain.

People also have pointed out things like btree based directory trees etc,
but btree directories are a win when you have large directories, as
oppossed to lots of directories.

The critical function I'm trying to optimize is the latency of the
open() system call.

> MR> In practise, on large server, it's rare to get a very high level of
> MR> cache hits (3 million file filesystem would need 384K of ram just to
> MR> hold the inode tables in the best case, ignoring all the directories,
> MR> the other meta-data, and the on-going disk activity).
>
> Perhaps the directory cache is too small for your machine?

There are around 390,000 directories holding those files. Just how big
did you want to the directory cache to get!?

The point is that caching simply won't work. This is something very
close to random open()'s over the entire filesystem. Unless the cache
size if greater than the meta-data, the cache locality will always be
very poor.

So: Given that you _are_ going to get a cache miss, how do you speed
it up? The obvious way is to try and eliminate the seperate inode
seek.

> MR> My example case has less than 100 entries per directory. (LOTS of
> MR> directories tho).
>
> Sounds like a case of a too small directory cache. ext2 has some
> fairly slow directory routines, which I notice whenever I do an ls in
> a the usr/X11R6/man/man3 directory where all of the filenames are too
> large for the cache. It takes forever in part because I run zlibc
> which stats them all, etc.

The filenames are all 8 letters long. The issue isn't the directory
cache. The issue is the (IMHO) large number of seeks needed to read
the first block of a file.

Michael.