RE: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock
From: Michael Kelley
Date: Fri Jan 10 2025 - 12:00:01 EST
From: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> Sent: Friday, January 10, 2025 1:28 AM
>
> On Thu, Jan 09, 2025 at 02:15:17AM -0800, Breno Leitao wrote:
> >
> > I would suggest we revert this patch until we investigate further. I'll
> > prepare and send a revert patch shortly.
>
> Sorry, I think it was my addition that broke things. The condition
> for checking whether an entry is inserted is incorrect, thus resulting
> in an underflow of the number of entries after entry removal.
>
> Please test this patch:
>
> ---8<---
> The function rhashtable_insert_one only returns NULL iff the
> insertion was successful, so that alone should be tested before
> increment nelems. Testing the variable data is redundant, and
> buggy because we will have overwritten the original value of data
> by this point.
>
> Reported-by: Michael Kelley <mhklinux@xxxxxxxxxxx>
> Fixes: e1d3422c95f0 ("rhashtable: Fix potential deadlock by moving schedule_work
> outside lock")
> Signed-off-by: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
>
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index bf956b85455a..e196b6f0e35a 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -621,7 +621,7 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const
> void *key,
>
> rht_unlock(tbl, bkt, flags);
>
> - if (PTR_ERR(data) == -ENOENT && !new_tbl) {
> + if (!new_tbl) {
> atomic_inc(&ht->nelems);
> if (rht_grow_above_75(ht, tbl))
> schedule_work(&ht->run_work);
> --
This patch fixes the problem I saw with VMs in the Azure cloud. Thanks!
Michael Kelley