Re: Is any file system on Linux appropriate for very large directories?

Matthias Urlichs (smurf@smurf.noris.de)
Sat, 13 Jul 1996 22:56:07 +0100


In linux.dev.kernel, article <31E6A66C.7B19E3F3@amazon.com>,
Eric Benson <eb@amazon.com> writes:
> We have an application here that uses lots of files in a single
> directory. At the time it was set up, it didn't seem to be a problem=
.=20
> However, due to Amazon.com's 30 percent per month growth rate, this i=
s
> now getting to be a serious problem due to the time (and kernel locku=
p)
> required for linear searching of directories. (By the way, this

The easy way out of this problem is to replace the "/path/to/file" with
"/path/to/XXX/file". XXX is a hash of the file name. Or the first lette=
r
(ever looked at /usr/lib/terminfo?). Or whatever works for you.

The hard way out is to play database. However, storing a lot of possibl=
y
equal- or maximum-sized chunks of data in one big file is not a problem=
.
You can use db or dbm for indexing the name to the storage location in =
the
big file; no problem. If the chunks are small enough, store them in the=
db
file itself. Easy.

NB:
If you need a real database, there are two network-aware SQL systems on=
the
latest SuSE Linux CD-ROM; LNX (not that featureful, but essentially fre=
e)
and YARD (can do a lot of things, but the CD only has a limited version=
).
Don't ask me where to FTP the beast(s) from, I don't know. There's also
Postgres95, but I definitely wouldn't trust my business to it (we tried=
).
We're using LNX now; it's cheaper, and its list of features is adequate=
.

> that uses some kind of hashing for name lookup! A quick review of th=
e
> file systems currently available on Linux suggests that the only one
> that uses hashing is the Amiga file system. I don't mean to be

A few others use B trees. The problem is that B trees are rather diffic=
ult
to write correctly (and doubly so in a multitasking environment); this =
is
why HPFS is read-only and HFS is only somewhat-write. At the moment.

--=20
The reason the way of the transgressor is hard is because it's so
crowded.
-- Kin Hubbard
--=20
Matthias Urlichs \ noris network GmbH / Xlink-POP N=FCrnberg=
=20
Schleiermacherstra=DFe 12 \ Linux+Internet / EMail: urlichs@nor=
is.de
90491 N=FCrnberg (Germany) \ Consulting+Programming+Networking+etc=
'ing
PGP: 1024/4F578875 1B 89 E2 1C 43 EA 80 44 15 D2 29 CF C6 C7 E0 D=
E
Click <A HREF=3D"http://info.noris.de/~smurf/finger">here</A>. =
42