Re: Hashing and directories

From: Goswin Brederlow (goswin.brederlow@student.uni-tuebingen.de)
Date: Thu Mar 08 2001 - 07:42:16 EST


>>>>> " " == Pavel Machek <pavel@suse.cz> writes:

> Hi!
>> I was hoping to point out that in real life, most systems that
>> need to access large numbers of files are already designed to
>> do some kind of hashing, or at least to divide-and-conquer by
>> using multi-level directory structures.

> Yes -- because their workaround kernel slowness.

> I had to do this kind of hashing because kernel disliked 70000
> html files (copy of train time tables).

> BTW try rm * with 70000 files in directory -- command line will
> overflow.

There are filesystems that use btrees (reiserfs) or hashing (affs) for
directories.

That way you get a O(log(n)) or even O(1) access time for
files. Saddly the hashtable for affs depends on the blocksize and
linux AFAIK only allows far too small block sizes (512 byte) for affs.
It was designed for floppies, so the lack of dynamically resizing hash
tables is excused.

What also could be done is to keed directories sorted. Creating of
files would cost O(N) time but a lookup could be done in
O(log(log(n))) most of the time with reasonable name distribution.
This could be done with ext2 without breaking any compatibility. One
would need to convert (sort all directories) every time the FS was
mounted RW by an older ext2, but otherwise nothing changes.

Would you like to write support for this?

MfG
        Goswin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Apr 30 2001 - 21:00:17 EST