Re: [PATCH 2/2] futex: Leave the pi lock stealer in a consistent state upon successful fault

From: Peter Zijlstra
Date: Tue Mar 16 2021 - 07:20:51 EST


On Sun, Mar 14, 2021 at 10:02:24PM -0700, Davidlohr Bueso wrote:
> Before 34b1a1ce145 (futex: Handle faults correctly for PI futexes) any
> concurrent pi_state->owner fixup would assume that the task that fixed
> things on our behalf also correctly updated the userspace value. This
> is not always the case anymore, and can result in scenarios where a lock
> stealer returns a successful FUTEX_PI_LOCK operation but raced during a fault
> with an enqueued top waiter in an immutable state so the uval TID was
> not updated for the stealer, breaking otherwise expected (and valid)
> semantics and confusing the stealer task:


> ---
> kernel/futex.c | 16 +++++++++++++---
> 1 file changed, 13 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/futex.c b/kernel/futex.c
> index ded7af2ba87f..95ce10c4e33d 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -2460,7 +2460,6 @@ static int __fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
>
> case -EAGAIN:
> cond_resched();
> - err = 0;
> break;
>
> default:
> @@ -2474,11 +2473,22 @@ static int __fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
> /*
> * Check if someone else fixed it for us:
> */
> - if (pi_state->owner != oldowner)
> + if (pi_state->owner != oldowner) {
> + /*
> + * The change might have come from the rare immutable
> + * state below, which leaves the userspace value out of
> + * sync. But if we are the lock stealer and can update
> + * the uval, do so, instead of reporting a successful
> + * lock operation with an invalid user state.
> + */
> + if (!err && argowner == current)
> + goto retry;
> +
> return argowner == current;
> + }
>
> /* Retry if err was -EAGAIN or the fault in succeeded */
> - if (!err)
> + if (err == -EAGAIN || !err)
> goto retry;
>

IIRC we made the explicit choice to never loop here. That saves having
to worry about getting stuck in in-kernel loops.

Userspace triggering the case where the futex goes corrupt is UB, after
that we have no obligation for anything to still work. It's on them,
they get to deal with the bits remaining.