> kai@khms.westfalen.de (Kai Henningsen) writes:
>
> > It can handle as many entries with the exact same hash as there are
> > elements in the tBucket.E array. I do not consider this a problem.
>
> I would. But then, not too long ago I visited a potential customer
> site that had been bitten by this same deficiency in ndbm. It
How many entries did he have in his database? And how many entries with
identical hash can ndbm hold?
> I think building in any such limitation in a database or hash scheme
> is short-sighted. Granted, if hundreds of your directory entries hash
> to the same value, you should probably fix your algorithms to generate
> slightly less structured names or something, but I think it should
If hundreds of your directory entries hash to the same value, then you
have performed a miracle - at least with the hash function in there.
I have this working with about 100 MB worth of message ids, and the
largest hash collision I've seen is still in the single digits.
And message ids are just about as "structured" as they get.
> Also the 4k bucket worries me a little. Does that mean the minimum
Change parameters to taste. Nothing sacred about them.
> directory size would be something like 6k or 8k? I've had trees I
It might even be a good idea to only add the hash structures once the
directories grow over a certain size. Note I have that all the
*information* in one sequential file, just like current Unix directories.
If it's done that way, I'd say that the directory should be larger than,
say, two buckets before we add hashing - maybe even more. If we can't
reduce disk accesses, there's not much point in the overhead, is there?
MfG Kai