Re: Is any file system on Linux appropriate for very large directories?

Adam McKee (amm130@mail.usask.ca)
Mon, 15 Jul 1996 05:51:27 -0600 (CST)


[ some stuff deleted ]

Linus "God" Torvalds wrote:

> Now, I admit that using a hashed directory lookup strategy (or even just
> sorted directories and binary searches or whatever) is a reasonable thing
> to do, but on the other hand I don't feel it is necessarily the _right_
> thing to do. I don't think the directory structure of a filesystem is
> necessarily meant to be a database on any larger scale, and on a smaller
> scale there are problems with the "faster" lookup strategies (more
> complexity, more overhead for small directories).

Obviously, one needs to weigh the additional performance against the added
complexity. It seems to me that a hashing directory lookup should provide
a nice performance increase in exchange for relatively little added
complexity. A hashing lookup should be faster than linear in proportion
to the # of entries in the directory. It should always be faster than
linear, except perhaps in the case of _very_ small directories.

Given a linear directory search, the strategy you suggested (which amounts
to a sort of user-level chained hashing) is very practical, and can be
tailored to the user's requirements. For applications that are to be
ported to various platforms, implementing this user-level strategy ensures
good performance regardless of how the OS does lookups. Still, optimizing
how the OS does lookups should ensure better performance, without
requiring any special effort from the applications programmer or the user.
It would be nice if the need to implement a user-level solution was
dictated only be the need to run the software on an "inferior" OS ;-)

-- Adam McKee <Adam.McKee@usask.ca>