Re: [patch] futex: Cure exit race

From: Peter Zijlstra
Date: Mon Dec 10 2018 - 11:02:16 EST


On Mon, Dec 10, 2018 at 04:23:06PM +0100, Thomas Gleixner wrote:

> kernel/futex.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 53 insertions(+), 4 deletions(-)
>
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -1148,11 +1148,60 @@ static int attach_to_pi_state(u32 __user
> return ret;
> }
>
> +static int handle_exit_race(u32 __user *uaddr, u32 uval, struct task_struct *tsk)
> +{
> + u32 uval2;
> +
> + /*
> + * If PF_EXITPIDONE is not yet set try again.
> + */
> + if (!(tsk->flags & PF_EXITPIDONE))
> + return -EAGAIN;
> +
> + /*
> + * Reread the user space value to handle the following situation:
> + *
> + * CPU0 CPU1
> + *
> + * sys_exit() sys_futex()
> + * do_exit() futex_lock_pi()
> + * exit_signals(tsk) No waiters:
> + * tsk->flags |= PF_EXITING; *uaddr == 0x00000PID
> + * mm_release(tsk) Set waiter bit
> + * exit_robust_list(tsk) { *uaddr = 0x80000PID;

Just to clarify; this is: sys_futex() <- futex_lock_pi() <-
futex_lock_pi_atomic(), where we do:

lock_pi_update_atomic(); // changes the futex word
attach_to_pi_owner(); // possibly returns ESRCH after changing the word


> + * Set owner died attach_to_pi_owner() {
> + * *uaddr = 0xC0000000; tsk = get_task(PID);
> + * } if (!tsk->flags & PF_EXITING) {
> + * ... attach();
> + * tsk->flags |= PF_EXITPIDONE; } else {
> + * if (!(tsk->flags & PF_EXITPIDONE))
> + * return -EAGAIN;
> + * return -ESRCH; <--- FAIL
> + * }
> + *
> + * Returning ESRCH unconditionally is wrong here because the
> + * user space value has been changed by the exiting task.
> + */
> + if (get_futex_value_locked(&uval2, uaddr))
> + return -EFAULT;
> +
> + /* If the user space value has changed, try again. */
> + if (uval2 != uval)
> + return -EAGAIN;

And this then goes back to futex_lock_pi(), which does a retry loop.

> + /*
> + * The exiting task did not have a robust list, the robust list was
> + * corrupted or the user space value in *uaddr is simply bogus.
> + * Give up and tell user space.
> + */
> + return -ESRCH;

If it is unchanged; -ESRCH is a valid return value.

> +}

There is another callers of futex_lock_pi_atomic(),
futex_proxy_trylock_atomic(), which is part of futex_requeue(), that too
does a retry loop on -EAGAIN.

And there is another caller of attach_to_pi_owner(): lookup_pi_state(),
and that too is in futex_requeue() and handles the retry case properly.

Yes, this all looks good.

Acked-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>