Re: Is any file system on Linux appropriate for very large directories?

Adam McKee (amm130@mail.usask.ca)
Mon, 15 Jul 1996 09:25:03 -0600 (CST)


Linus "God" Torvalds wrote:

> The problem with hashed directories (or any other kind of "smart"
> directories) is not the lookup phase: that's pretty simple, and is almost
> always speeded up by the hashing (or the hashing is done badly).

> The _real_ complexity comes in when you start adding/deleting entries.
> You may get faster lookups, but you're almost sure to get slower
> modifications, and what's worse - keeping track of the extra state is
> making the whole thing more complex. That's ok if you're a database, but
> I'm not certain it makes sense for the OS.

Lookups should be more common than updates in most environments. So, the
rule "make the common case fast" seems to be served by a hashed directory.
The worst update situation would probably be the case where you need to
grow the directory -- in that case, you would probably have to rewrite the
whole directory (double its size, and then rehash all the entries into the
new, larger table). However, this high price would only have to be paid
occasionally. I don't see why the other update operations would
necessarily be much more costly. In fact, many of them (i.e. delete)
involve a lookup, so they should be expedited.

However, I understand the additional performance could be very minimal,
given Linux's caching of directory entries. If most lookups are satisfied
by that cache (?), then the extra performance provided by hashed
directories might not justify the added complexity after all. I was
unaware Linux had such a cache. Of course, I should have assumed it did
have one... :)

Also, is the Linux scheduler going to be the object of any fun and games
in 2.1.x ?

-- Adam