Re: [RFC][PATCH 4/4] futex: Rewrite FUTEX_UNLOCK_PI

From: Peter Zijlstra
Date: Fri Nov 25 2016 - 11:19:43 EST


On Thu, Oct 27, 2016 at 10:36:00PM +0200, Thomas Gleixner wrote:
> So that waiter which is now spinning on pi_mutex->lock has already set the
> waiters bit, which you undo. So you created the following scenario:
>
> CPU0 CPU1 CPU2
>
> TID 0x1001 TID 0x1000 TID 0x1002
>
> Acquires futex in user space
> futex value = 0x1000;
>
> futex_lock_pi()
>
> lock_hb()
> set_waiter_bit()
> --> futex value = 0x40001000;
>
> futex_unlock_pi()
> lock_hb()
> attach_to_pi_owner()
> rt_mutex_init_proxy_locked()
> queue_me()
> unlock_hb()
> unlock_hb()
> rt_mutex_lock() wake_futex_pi()
> lock(pi_mutex->lock);
> lock(pi_mutex->lock) new_owner is NULL;
> --> futex value = 0;
> rt_mutex_futex_unlock(pi_mutex);
> unlock(pi_mutex->lock);
> acquire_rt_mutex() return to user space
> Acquires futex in user space
> --> futex value = 0x1002
> fixup_owner()
> fixup_pi_state_owner()
> uval = 0x1002;
> newval = 0x40001001;
> cmpxchg(uval, newval) succeeds
> --> futex value = 0x40001001
>
> Voila. Inconsistent state .... TID 0x1002 is borked now.

OK, I think its much worse...


Consider:


CPU0 (tid=0x1000) CPU1 (tid=0x1001) CPU2 (tid=0x1002)


acquire in userspace
*uaddr = 0x1000

futex_lock_pi()
acq hb->lock
attach_to_pi_owner
pi_state->refcount == 1
queue_me
rel hb->lock
futex_lock_pi()
acq hb->lock
attach_to_pi_state
pi_state->refcount == 2
queue_me
rel hb->lock

futex_unlock_pi()
acq pi_mutex->lock
top_waiter == (CPU1 | CPU2)
wake_futex_pi()
new_owner == NULL
pi_state->owner = &init_task
*uaddr = 0
rt_mutex_futex_unlock()
ret-to-user


acquire in userspace
*uaddr = 0x1000



>From here there's multiple ways to trainwreck the thing, the way you
list above, lets call this A):

rt_mutex_lock()
acq hb->lock
fixup_owner()
cmpxchg(uaddr, 0x1000, 0x40001002)

*BORKED CPU0*

alternatively we can do; B):

CPU3 (or CPU0 with a different tid)

futex_lock_pi()
acq hb->lock
attach_to_pi_state()
-EINVAL : *uaddr != task_pid_vnr(&init_task)



Now, since CPU1 is pinning the hb->waiter relation, the proposed
fixup_owner() -EAGAIN change cannot work either, since, A1):

rt_mutex_lock()
acq hb->lock
fixup_owner() : -EAGAIN
unqueue_me_pi()
put_pi_state
pi_state->refcount == 1
rel hb->lock
goto retry_private

retry_private:
acq hb->lock
attach_to_pi_state() : -EINVAL


Similar things happen with CMP_REQUEUE doing a lookup_pi_state() around
this point.

I'm sorely tempted to do the 'simple' thing and leak FUTEX_WAITERS,
forcing things into the slow path.