Re: [PATCH 5/6] rhashtable: support guaranteed successful insertion.

From: NeilBrown
Date: Wed Mar 28 2018 - 17:26:36 EST

Next message: David Rientjes: "Re: [PATCH] slab, slub: skip unnecessary kasan_cache_shutdown()"
Previous message: Thomas Gleixner: "Re: [RFC PATCH for 4.17 02/21] rseq: Introduce restartable sequences system call (v12)"
In reply to: Herbert Xu: "Re: [PATCH 5/6] rhashtable: support guaranteed successful insertion."
Next in thread: Herbert Xu: "Re: [PATCH 5/6] rhashtable: support guaranteed successful insertion."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Mar 28 2018, Herbert Xu wrote:

> On Wed, Mar 28, 2018 at 06:04:40PM +1100, NeilBrown wrote:
>>
>> I disagree. My patch 6 only makes it common instead of exceedingly
>> rare. If any table in the list other than the first has a chain with 16
>> elements, then trying to insert an element with a hash which matches
>> that chain will fail with -EBUSY. This is theoretically possible
>> already, though astronomically unlikely. So that case will never be
>> tested for.
>
> No that's not true. If the table is correctly sized then the
> probability of having a chain with 16 elements is extremely low.

I say "astronomically unlikely", you say "probability .. is extremely
low". I think we are in agreement here.

The point remains that if an error *can* be returned then I have to
write code to handle it and test that code. I'd rather not.

>
> Even if it does happen we won't fail because we will perform
> an immediate rehash. We only fail if it happens right away
> after the rehash (that is, at least another 16 elements have
> been inserted and you're trying to insert a 17th element, all
> while the new hash table has not been completely populated),
> which means that somebody has figured out our hash secret and
> failing in that case makes sense.
>
>> It is hard to know if it is necessary. And making the new table larger
>> will make the error less likely, but still won't make it impossible. So
>> callers will have to handle it - just like they currently have to handle
>> -ENOMEM even though it is highly unlikely (and not strictly necessary).
>
> Callers should not handle an ENOMEM error by retrying. Nor should
> they retry an EBUSY return value.

I never suggested retrying, but I would have to handle it somehow. I'd
rather not.

>
>> Are these errors ever actually useful? I thought I had convinced myself
>> before that they were (to throttle attacks on the hash function), but
>> they happen even less often than I thought.
>
> The EBUSY error indicates that the hash table has essentially
> degenereated into a linked list because somebody has worked out
> our hash secret.

While I have no doubt that there are hashtables where someone could try
to attack the hash, I am quite sure there are others where is such an
attack is meaningless - any code which could generate the required range of
keys, could do far worse things more easily.

>
>> Maybe. Reading a percpu counter isn't cheap. Reading it whenever a hash
>> chain reaches 16 is reasonable, but I think we would want to read it a
>> lot more often than that. So probably store the last-sampled time (with
>> no locking) and only sample the counter if last-sampled is more than
>> jiffies - 10*HZ (???)
>
> We could also take the spinlock table approach and have a counter
> per bucket spinlock. This should be sufficient as you'll contend
> on the bucket spinlock table anyway.

Yes, storing a sharded count in the spinlock table does seem like an
appropriate granularity. However that leads me to ask: why do we have
the spinlock table? Why not bit spinlocks in the hashchain head like
include/linux/list_bl uses?

>
> This also allows us to estimate the total table size and not have
> to always do a last-ditch growth when it's too late.

I don't understand how it can ever be "too late", though I appreciate
that in some cases "sooner" is better than "later"
If we give up on the single atomic_t counter, then we must accept that
the number of elements could exceed any given value. The only promise
we can provide is that it wont exceed N% of the table size for more than
T seconds.

Thanks,
NeilBrown

>
> Cheers,
> --
> Email: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Attachment: signature.asc
Description: PGP signature

Next message: David Rientjes: "Re: [PATCH] slab, slub: skip unnecessary kasan_cache_shutdown()"
Previous message: Thomas Gleixner: "Re: [RFC PATCH for 4.17 02/21] rseq: Introduce restartable sequences system call (v12)"
In reply to: Herbert Xu: "Re: [PATCH 5/6] rhashtable: support guaranteed successful insertion."
Next in thread: Herbert Xu: "Re: [PATCH 5/6] rhashtable: support guaranteed successful insertion."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]