Re: [PATCH v10 18/21] futex: Rework SET_SLOTS

From: Sebastian Andrzej Siewior
Date: Wed Mar 26 2025 - 11:42:22 EST


On 2025-03-12 16:16:31 [+0100], To linux-kernel@xxxxxxxxxxxxxxx wrote:

I am folding and testing and

> +static bool futex_pivot_pending(struct mm_struct *mm)
> +{
> + struct futex_private_hash *fph;
> +
> + guard(rcu)();
> +
> + if (!mm->futex_phash_new)
> + return false;
> +
> + fph = rcu_dereference(mm->futex_phash);
> + return !rcuref_read(&fph->users);
> +}

> +static int futex_hash_allocate(unsigned int hash_slots, bool custom)

> /*
> - * Will set mm->futex_phash_new on failure;
> - * futex_get_private_hash() will try again.
> + * Only let prctl() wait / retry; don't unduly delay clone().
> */
> - __futex_pivot_hash(mm, fph);
> +again:
> + wait_var_event(mm, futex_pivot_pending(mm));

This wait condition should be !futex_pivot_pending(). Otherwise it
blocks. We want to wait until the current futex_phash_new assignment is
gone and the ::users counter is >0.

This brings me to the wake condition of which we have two:
> @@ -207,6 +203,7 @@ static bool __futex_pivot_hash(struct mm_struct *mm,
> }
> rcu_assign_pointer(mm->futex_phash, new);
> kvfree_rcu(fph, rcu);
> + wake_up_var(mm);
> return true;
> }
>
> @@ -262,7 +259,8 @@ void futex_private_hash_put(struct futex_private_hash *fph)
> * Ignore the result; the DEAD state is picked up
> * when rcuref_get() starts failing via rcuref_is_dead().
> */
> - bool __maybe_unused ignore = rcuref_put(&fph->users);
> + if (rcuref_put(&fph->users))
> + wake_up_var(fph->mm);
> }

The one in __futex_pivot_hash() makes sense because ::futex_phash_new is
NULL and the users counter is set to one.
The wake in futex_private_hash_put() doesn't make sense. At this point
we have ::futex_phash_new set and rcuref_read() returns 0. So we
schedule again after the wake.
Therefore we could remove the wake from futex_private_hash_put().
However, if there is no futex operation (unlikely) then we are stuck in
wait_var_event() forever. Therefore I would suggest to:

diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index 65523f3cfe32e..64c7be8df955c 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -210,7 +210,6 @@ static bool __futex_pivot_hash(struct mm_struct *mm,
}
rcu_assign_pointer(mm->futex_phash, new);
kvfree_rcu(fph, rcu);
- wake_up_var(mm);
return true;
}

@@ -1522,10 +1521,10 @@ static bool futex_pivot_pending(struct mm_struct *mm)
guard(rcu)();

if (!mm->futex_phash_new)
- return false;
+ return true;

fph = rcu_dereference(mm->futex_phash);
- return !rcuref_read(&fph->users);
+ return rcuref_is_dead(&fph->users);
}

static bool futex_hash_less(struct futex_private_hash *a,

-> Attempt to replace if there no replacement pending (futex_phash_new == NULL).
-> If there is replacement (futex_phash_new != NULL) then wait until the
current private hash is DEAD. This happens once the last user is gone
and gives the wakeup.

Sebastian