Re: [patch 1/5] futex: Make unlock_pi more robust

From: Darren Hart
Date: Mon Jun 16 2014 - 12:11:36 EST


On Wed, 2014-06-11 at 20:45 +0000, Thomas Gleixner wrote:
> The kernel tries to atomically unlock the futex without checking
> whether there is kernel state associated to the futex.
>
> So if user space manipulated the user space value, this will leave
> kernel internal state around associated to the owner task.
>
> For robustness sake, lookup first whether there are waiters on the
> futex. If there are waiters, wake the top priority waiter with all the
> proper sanity checks applied.
>
> If there are no waiters, do the atomic release. We do not have to
> preserve the waiters bit in this case, because a potentially incoming
> waiter is blocked on the hb->lock and will acquire the futex
> atomically. We neither have to preserve the owner died bit. The caller
> is the owner and it was supposed to cleanup the mess.
>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> ---
> kernel/futex.c | 78
> +++++++++++++++++++--------------------------------------
> 1 file changed, 26 insertions(+), 52 deletions(-)
>
> Index: linux/kernel/futex.c
> ===================================================================
> --- linux.orig/kernel/futex.c
> +++ linux/kernel/futex.c
> @@ -1186,22 +1186,6 @@ static int wake_futex_pi(u32 __user *uad
> return 0;
> }
>
> -static int unlock_futex_pi(u32 __user *uaddr, u32 uval)
> -{
> - u32 uninitialized_var(oldval);
> -
> - /*
> - * There is no waiter, so we unlock the futex. The owner died
> - * bit has not to be preserved here. We are the owner:
> - */
> - if (cmpxchg_futex_value_locked(&oldval, uaddr, uval, 0))
> - return -EFAULT;
> - if (oldval != uval)
> - return -EAGAIN;
> -
> - return 0;
> -}

I thought we used that in multiple places, but apparently not. Worth
consolidating.

> -
> /*
> * Express the locking dependencies for lockdep:
> */
> @@ -2401,10 +2385,10 @@ uaddr_faulted:
> */
> static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
> {
> - struct futex_hash_bucket *hb;
> - struct futex_q *this, *next;
> + u32 uninitialized_var(curval), uval, vpid =
> task_pid_vnr(current);
> union futex_key key = FUTEX_KEY_INIT;
> - u32 uval, vpid = task_pid_vnr(current);
> + struct futex_hash_bucket *hb;
> + struct futex_q *match;
> int ret;
>
> retry:
> @@ -2417,57 +2401,47 @@ retry:
> return -EPERM;
>
> ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &key,
> VERIFY_WRITE);
> - if (unlikely(ret != 0))
> - goto out;
> + if (ret)
> + return ret;

Looks like you're also trying to move away from a single exit point to
multiple exit points. I prefer the single exit (which you've probably
noticed :-), but it's a subjective thing, and so long as we are not
duplicating cleanup logic, I guess it's fine either way. This change was
not mentioned in the commit message though.

>
> hb = hash_futex(&key);
> spin_lock(&hb->lock);
>
> /*
> - * To avoid races, try to do the TID -> 0 atomic transition
> - * again. If it succeeds then we can return without waking
> - * anyone else up. We only try this if neither the waiters nor
> - * the owner died bit are set.
> - */
> - if (!(uval & ~FUTEX_TID_MASK) &&
> - cmpxchg_futex_value_locked(&uval, uaddr, vpid, 0))
> - goto pi_faulted;
> - /*
> - * Rare case: we managed to release the lock atomically,
> - * no need to wake anyone else up:
> - */
> - if (unlikely(uval == vpid))
> - goto out_unlock;
> -
> - /*
> - * Ok, other tasks may need to be woken up - check waiters
> - * and do the wakeup if necessary:
> - */
> - plist_for_each_entry_safe(this, next, &hb->chain, list) {
> - if (!match_futex (&this->key, &key))
> - continue;
> - ret = wake_futex_pi(uaddr, uval, this);
> + * Check waiters first. We do not trust user space values at
> + * all and we at least want to know if user space fiddled
> + * with the futex value instead of blindly unlocking.
> + */
> + match = futex_top_waiter(hb, &key);
> + if (match) {
> + ret = wake_futex_pi(uaddr, uval, match);
> /*
> - * The atomic access to the futex value
> - * generated a pagefault, so retry the
> - * user-access and the wakeup:
> + * The atomic access to the futex value generated a
> + * pagefault, so retry the user-access and the wakeup:
> */
> if (ret == -EFAULT)
> goto pi_faulted;
> goto out_unlock;
> }
> +
> /*
> - * No waiters - kernel unlocks the futex:
> + * We have no kernel internal state, i.e. no waiters in the
> + * kernel. Waiters which are about to queue themself are stuck

themselves

> + * on hb->lock. So we can safely ignore them. We do neither
> + * preserve the WAITERS bit not the OWNER_DIED one. We are the

We preserve neither the WAITERS bit nor the OWNER_DIED bit.
(the above use of "do" and "not" is incorrect and could easily be
misinterpreted).

> + * owner.

In wake_futex_pi we verify ownership by matching pi_state->owner ==
current, but here the only test is the TID value, which is set by
userspace - which we don't trust...

I'm trying to determine if it matters in this case... if there are no
waiters, is the pi_state still around? If so, it does indeed matter, and
we should be verifying.


> */
> - ret = unlock_futex_pi(uaddr, uval);
> - if (ret == -EFAULT)
> + if (cmpxchg_futex_value_locked(&curval, uaddr, uval, 0))
> goto pi_faulted;
>

This refactoring seems like it would be best done as a prequel patch so
as not to confuse cleanup with functional change. At least that is what
you and others have beaten into me over the years ;-)

> + /*
> + * If uval has changed, let user space handle it.
> + */
> + ret = (curval == uval) ? 0 : -EAGAIN;
> +
> out_unlock:
> spin_unlock(&hb->lock);
> put_futex_key(&key);
> -
> -out:
> return ret;
>

By dropping this you won't return ret, but rather fall through into
pi_faulted... which certainly isn't what you wanted.

The need for better test coverage is very evident now :-)

--
Darren

> pi_faulted:
>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/