Re: Is any file system on Linux appropriate for very large directories?

Drew Eckhardt (drew@poohsticks.org)
Sat, 13 Jul 1996 12:50:37 -0600


In message <31E6A66C.7B19E3F3@amazon.com>, eb@amazon.com writes:
>We have an application here that uses lots of files in a single
>directory. At the time it was set up, it didn't seem to be a problem.
>However, due to Amazon.com's 30 percent per month growth rate, this is
>now getting to be a serious problem due to the time (and kernel lockup)
>required for linear searching of directories.

1. Any reasonable unix implementation (this includes Linux with the
ext2fs) uses a name-to-inode translation cache, so sequential
directory lookups within the kernel are usually avoided.

2. In some cases (for example, within a news reader) it may be
desireable for implementation or user interface purposes (ie,
having a directory layout which mirrors the newsgroup hierchy
is quite intuitive, and opting for one or more directories of
files facilitates incremental updates using standard tools like
ftp).

If you're going to do this, you need to be aware that the standard
POSIX interface to directories (opendir()/readdir()/closedir())
only provides for sequential access to an unsorted set of entries,
and that lookups using this interface are going to be O(n).

>(By the way, this
>application is currently running on Suns, not on Linux, but moving it to
>Linux is an option we are considering.) The "right" solution to this
>problem is to reimplement our application using a "real" database,

Not necessarily.

>but it is possible that it could be solved simply by using a file system
>that uses some kind of hashing for name lookup!

There may be user-land software issues as well.

>or any other approach that you can suggest?

If the performance problem is access to the directory entries rather
than opening files, you want to create a higher level abstraction than
POSIX offers that uses some form of caching. In the fast, robust, and
clean implementation (owned by my employer, so don't ask for code) I
did, I used a sorted array of entries, with updates (read the entire
directory, and apply qsort()) when a read request could not be satisified
and mtime on the directory was >= the last cache update time (mtime
has a one-second granularity and is not purely monotonic).