Re: [PATCH 5/6] rhashtable: support guaranteed successful insertion.

From: NeilBrown
Date: Wed Mar 28 2018 - 03:04:55 EST


On Wed, Mar 28 2018, Herbert Xu wrote:

> On Wed, Mar 28, 2018 at 08:34:19AM +1100, NeilBrown wrote:
>>
>> It is easy to get an -EBUSY insertion failure when .disable_count is
>> enabled, and I did get that. Blindly propagating that up caused lustre
>> to get terribly confused - not too surprising really.
>
> Right, so this failure mode is specific to your patch 6.

I disagree. My patch 6 only makes it common instead of exceedingly
rare. If any table in the list other than the first has a chain with 16
elements, then trying to insert an element with a hash which matches
that chain will fail with -EBUSY. This is theoretically possible
already, though astronomically unlikely. So that case will never be
tested for.

>
> I think I see the problem. As it currently stands, we never
> need growing when we hit the bucket length limit. We only do
> rehashes.
>
> With your patch, you must change it so that we actually try to
> grow the table if necessary when we hit the bucket length limit.

It is hard to know if it is necessary. And making the new table larger
will make the error less likely, but still won't make it impossible. So
callers will have to handle it - just like they currently have to handle
-ENOMEM even though it is highly unlikely (and not strictly necessary).

Are these errors ever actually useful? I thought I had convinced myself
before that they were (to throttle attacks on the hash function), but
they happen even less often than I thought.

>
> Otherwise it will result in the EBUSY that you're seeing.
>
> I laso think that we shouldn't make this a toggle. If we're going
> to do disable_count then it should be unconditionally done for
> everyone.
>
> However, I personally would prefer a percpu elem count instead of
> disabling it completely. Would that be acceptable to your use-case?

Maybe. Reading a percpu counter isn't cheap. Reading it whenever a hash
chain reaches 16 is reasonable, but I think we would want to read it a
lot more often than that. So probably store the last-sampled time (with
no locking) and only sample the counter if last-sampled is more than
jiffies - 10*HZ (???)

In the case in lustre we also shard the LRU list so that adding to the
LRU causes minimal contention. Keeping a shared counter together with
the lru is trivial and summing them periodically is little burden.
Maybe it makes sense to include that functionality if rhashtables so
that it is there for everyone.

A percpu counter uses a lot more memory than atomic_t. Given that some
callers set nelem_hint to 2 or 3, it seem likely that those callers
don't want to waste memory. Should we force them to?

Thanks,
NeilBrown

Attachment: signature.asc
Description: PGP signature