Re: [PATCH net-next] rhashtable: further improve stability of rhashtable_walk

From: Herbert Xu
Date: Thu Dec 13 2018 - 03:48:09 EST


On Thu, Dec 13, 2018 at 02:48:59PM +1100, NeilBrown wrote:
>
> Yes, you could rcu_free the old one and allocate a new one. Then you
> would have to be ready to deal with memory allocation failure which
> complicates usage (I already don't like that rhashtable_insert() can
> report -ENOMEM!).

Yes there will be a cost to dealing with allocation failure but at
least it'll work reliably in all cases. For the intended use-case
of dumping things to user-space allocation failure is a non-issue.

> > Now you're conflating two different things. Dropping the RCU
> > isn't necessarily slow. We were talking about waiting for an
> > RCU grace period which would only come into play if you were
> > suspending the walk indefinitely. Actually as I said above even
> > there you don't really need to wait.
>
> How would rhashtable_walk_stop() know if it was indefinite or not?

You assume that it's always indefinite because the typical usage of
stop is because we have run out of memory and must wait for user-
space to read what we have produced so far to free up memory.

> *Not* keeping them all in the hash chain is ideal, but not essential.
> I see three costs with this.
> One is that we would compare the same key multiple times for lookup.
> How much of a problem is that? A failing compare is usually quite quick,
> and most rhltable uses have inline memcmp for comparison (admittedly not
> all).
>
> The second cost is tracking the chain length against elasticity.
> We could flag one object with each key as a 'master' (low bit of the
> 'next' pointer) and only count the masters. When lookup raced with
> remove this might get a slightly incorrect count, but I don't think that
> hurts.
>
> Finally, there is more pointer chasing as the chains are longer.

The biggest problem is that you can no longer return the lookup
result. When you perform a lookup on rhltable you need to return
all the matching objects, not just a random one.

Cheers,
--
Email: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt