Re: [PATCH net-next] rhashtable: further improve stability of rhashtable_walk

From: NeilBrown
Date: Wed Dec 12 2018 - 01:41:46 EST

On Wed, Dec 12 2018, Herbert Xu wrote:

> On Wed, Dec 12, 2018 at 11:02:35AM +1100, NeilBrown wrote:
>> So I think this is a real bug - it is quite unlikely to hit, but
>> possibly.
>> You need a chain with at least 2 objects, you need
>> rhashtable_walk_stop() to be called after visiting an object other than
>> the last object, and you need some thread (this or some other) to remove
>> that object from the table.
>> The patch that I posted aims to fix that bug, and only that bug.
>> The only alternative that I can think of is to document that this can
>> happen and advise that a reference should be held to the last visited
>> object when stop/start is called, or in some other way ensure that it
>> doesn't get removed.
> Thanks for reminding me of the issue you were trying to fix.
> So going back into the email archives, I suggested at the very
> start that we could just insert the walker objects into the actual
> hash table. That would solve the issue for both rhashtable and
> rhlist.
> Could we do that rather than using this ordered list that only
> works for rhashtable?

No. that doesn't work.
When you remove the walker object from the hash chain, you need to wait
for the RCU grace period to expire before you can safely insert back
into the chain. Inserting into a different chain isn't quite so bad now
that the nulls-marker stuff is working, a lookup thread will notice the
move and retry the lookup.

So you would substantially slow down the rhashtable_walk_start() step.
I've tried to think of various ways to over come this problem, such as
walking backwards through each chain - it is fairly safe to move and
object earlier in the chain - but all the approaches I have tried make
the code much more complex.


> Cheers,
> --
> Email: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
> Home Page:
> PGP Key:

Attachment: signature.asc
Description: PGP signature