Re: [PATCH -v6 05/13] futex: Change locking rules

From: Darren Hart
Date: Wed Apr 05 2017 - 17:19:05 EST


On Wed, Mar 22, 2017 at 11:35:52AM +0100, Peter Zijlstra wrote:
> Currently futex-pi relies on hb->lock to serialize everything. Since
> hb->lock is giving us problems (PI inversions among other things,
> since on -rt hb lock itself is a rt_mutex), we want to break this up a
> bit.
>
> This patch reworks and documents the locking. Notably, it
> consistently uses rt_mutex::wait_lock to serialize {uval, pi_state}.
> This would allow us to do rt_mutex_unlock() (including deboost)
> without holding hb->lock.
>
> Nothing yet relies on the new locking rules.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> ---
> kernel/futex.c | 165 +++++++++++++++++++++++++++++++++++++++++++++------------
> 1 file changed, 132 insertions(+), 33 deletions(-)
>
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -973,6 +973,39 @@ void exit_pi_state_list(struct task_stru
> *
> * [10] There is no transient state which leaves owner and user space
> * TID out of sync.
> + *
> + *
> + * Serialization and lifetime rules:
> + *
> + * hb->lock:
> + *
> + * hb -> futex_q, relation
> + * futex_q -> pi_state, relation
> + *
> + * (cannot be raw because hb can contain arbitrary amount
> + * of futex_q's)
> + *
> + * pi_mutex->wait_lock:
> + *
> + * {uval, pi_state}
> + *
> + * (and pi_mutex 'obviously')
> + *
> + * p->pi_lock:

This documentation uses a mix of types and common variable names. I'd recommend
some declarations just below "Serialization and lifetime rules:" to help make
this explicit, e.g.:

struct futex_pi_state *pi_state;
struct futex_hash_bucket *hb;
struct rt_mutex *pi_mutex;
struct futex_q *q;
task_struct *p;

> + *
> + * p->pi_state_list -> pi_state->list, relation
> + *
> + * pi_state->refcount:
> + *
> + * pi_state lifetime
> + *
> + *
> + * Lock order:
> + *
> + * hb->lock
> + * pi_mutex->wait_lock
> + * p->pi_lock
> + *
> */
>
> /*
> @@ -980,10 +1013,12 @@ void exit_pi_state_list(struct task_stru
> * the pi_state against the user space value. If correct, attach to
> * it.
> */
> -static int attach_to_pi_state(u32 uval, struct futex_pi_state *pi_state,
> +static int attach_to_pi_state(u32 __user *uaddr, u32 uval,
> + struct futex_pi_state *pi_state,
> struct futex_pi_state **ps)
> {
> pid_t pid = uval & FUTEX_TID_MASK;
> + int ret, uval2;

The uval should be an unsigned type:

u32 uval2;

>
> /*
> * Userspace might have messed up non-PI and PI futexes [3]
> @@ -991,9 +1026,34 @@ static int attach_to_pi_state(u32 uval,
> if (unlikely(!pi_state))
> return -EINVAL;
>
> + /*
> + * We get here with hb->lock held, and having found a
> + * futex_top_waiter(). This means that futex_lock_pi() of said futex_q
> + * has dropped the hb->lock in between queue_me() and unqueue_me_pi(),

This context got here like this:

futex_lock_pi
hb lock
futex_lock_pi_atomic
top waiter
attach_to_pi_state()

The queue_me and unqueue_me_pi both come after this in futex_lock_pi.
Also, the hb lock is dropped in queue_me, not between queue_me and
unqueue_me_pi.

Are you saying that in order to be here, there are at least two tasks contending
for the lock, and one that has come before us has proceeded as far as queue_me()
but has not yet entered unqueue_me_pi(), therefor we know there is a waiter and
it has a pi_state? If so, I think we can make this much clearer by at least
noting the two tasks in play.

...

> @@ -1336,6 +1418,7 @@ static int wake_futex_pi(u32 __user *uad
>
> if (cmpxchg_futex_value_locked(&curval, uaddr, uval, newval)) {
> ret = -EFAULT;
> +

Stray whitespace addition? Not explicitly against coding-style, but I don't
normally see a new line before the closing brace leading to an else...

> } else if (curval != uval) {
> /*
> * If a unconditional UNLOCK_PI operation (user space did not

...

--
Darren Hart
VMware Open Source Technology Center