Re: Is any file system on Linux appropriate for very large directories

Ken Raeburn (raeburn@cygnus.com)
03 Aug 1996 16:45:17 -0400


kai@khms.westfalen.de (Kai Henningsen) writes:

> It can handle as many entries with the exact same hash as there are
> elements in the tBucket.E array. I do not consider this a problem.

I would. But then, not too long ago I visited a potential customer
site that had been bitten by this same deficiency in ndbm. It
shouldn't be too complicated to replace tBucket.E[511] with a chain
pointer to another block. And it shouldn't be too hard to test by
cutting tBucket.E down to 10 entries.

I think building in any such limitation in a database or hash scheme
is short-sighted. Granted, if hundreds of your directory entries hash
to the same value, you should probably fix your algorithms to generate
slightly less structured names or something, but I think it should
result in performance degradation (and maybe one warning message to
the console from the fs code, and/or from fsck), not failure.

Also the 4k bucket worries me a little. Does that mean the minimum
directory size would be something like 6k or 8k? I've had trees I
couldn't copy from BSD to Linux in part because of lots of small
directories that got bigger when copying. (File block allocation
probably had a hand in it too, there were some directories with lots
of files. But I don't know what the granularity is in ext2 or umsdos
for files, whereas I can observe it for directories.) And that was
just 512 versus 1k; an 8k minimum would be much worse.