Re: [PATCH -v5 10/14] futex: Pull rt_mutex_futex_unlock() out from under hb->lock

From: Thomas Gleixner
Date: Tue Mar 07 2017 - 13:15:10 EST


On Sat, 4 Mar 2017, Peter Zijlstra wrote:
> @@ -1035,6 +1037,9 @@ static int attach_to_pi_state(u32 __user
> * has dropped the hb->lock in between queue_me() and unqueue_me_pi(),
> * which in turn means that futex_lock_pi() still has a reference on
> * our pi_state.
> + *
> + * IOW, we cannot race against the unlocked put_pi_state() in
> + * futex_unlock_pi().

That 'IOW' made my head spin for a while. I rather prefer to spell it out
more explicitely:

* The waiter holding a reference on @pi_state protects also
* against the unlocked put_pi_state() in futex_unlock_pi(),
* futex_lock_pi() and futex_wait_requeue_pi() as it cannot go to 0
* and consequentely free pi state before we can take a reference
* ourself.

> */
> WARN_ON(!atomic_read(&pi_state->refcount));
>
> @@ -1378,47 +1383,33 @@ static void mark_wake_futex(struct wake_
> smp_store_release(&q->lock_ptr, NULL);
> }
>
> -static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *top_waiter,
> - struct futex_hash_bucket *hb)

Please add a comment, that the caller must hold a reference on @pi_state

> +static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_pi_state *pi_state)
> {
> - struct task_struct *new_owner;
> - struct futex_pi_state *pi_state = top_waiter->pi_state;
> u32 uninitialized_var(curval), newval;
> + struct task_struct *new_owner;
> + bool deboost = false;
> DEFINE_WAKE_Q(wake_q);
> - bool deboost;
> int ret = 0;
>
> - if (!pi_state)
> - return -EINVAL;
> -
> - /*
> - * If current does not own the pi_state then the futex is
> - * inconsistent and user space fiddled with the futex value.
> - */
> - if (pi_state->owner != current)
> - return -EINVAL;
> -
> raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
> new_owner = rt_mutex_next_owner(&pi_state->pi_mutex);
> -
> - /*
> - * When we interleave with futex_lock_pi() where it does
> - * rt_mutex_timed_futex_lock(), we might observe @this futex_q waiter,
> - * but the rt_mutex's wait_list can be empty (either still, or again,
> - * depending on which side we land).
> - *
> - * When this happens, give up our locks and try again, giving the
> - * futex_lock_pi() instance time to complete and unqueue_me().
> - */
> if (!new_owner) {
> - raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
> - return -EAGAIN;
> + /*
> + * Since we held neither hb->lock nor wait_lock when coming
> + * into this function, we could have raced with futex_lock_pi()
> + * such that it will have removed the waiter that brought us
> + * here.

Hmm. That's not entirely correct. There are two cases:

lock_pi()
queue_me() <- Makes it visible as waiter in the hash bucket
unlock(hb->lock)

[1]

rtmutex_futex_lock()

[2]

lock(hb->lock)

Both [1] and [2] are valid reasons why the top waiter is not a rtmutex
waiter.

Thanks,

tglx