Re: commit cfafcd117 "futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()" causes glibc nptl/tst-robustpi8 failure
From: Peter Zijlstra
Date: Fri May 19 2017 - 11:49:25 EST
On Thu, May 18, 2017 at 10:34:34AM +0200, Florian Weimer wrote:
> On 05/18/2017 10:31 AM, Peter Zijlstra wrote:
> > But it does that after building the tst-robustpi8 thing, so I seem to
> > have all I need here.
>
> Great, have fun figuring out what's going on. :-/
Please test the below, it cures things for me.
---
Subject: futex,rt_mutex: Fix rt_mutex_cleanup_proxy_lock()
Markus reported that the glibc/nptl/tst-robustpi8 test was failing after
commit:
cfafcd117da0 ("futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()")
Much tracing later I managed to catch the culprit:
ld-linux-x86-64-2161 [019] .... 410.760971: SyS_futex: 00007ffbeb76b028: 80000875 op=FUTEX_LOCK_PI
ld-linux-x86-64-2161 [019] ...1 410.760972: lock_pi_update_atomic: 00007ffbeb76b028: curval=80000875 uval=80000875 newval=80000875 ret=0
ld-linux-x86-64-2165 [011] .... 410.760978: SyS_futex: 00007ffbeb76b028: 80000875 op=FUTEX_UNLOCK_PI
ld-linux-x86-64-2165 [011] d..1 410.760979: do_futex: 00007ffbeb76b028: curval=80000875 uval=80000875 newval=80000871 ret=0
ld-linux-x86-64-2165 [011] .... 410.760980: SyS_futex: 00007ffbeb76b028: 80000871 ret=0000
ld-linux-x86-64-2161 [019] .... 410.760980: SyS_futex: 00007ffbeb76b028: 80000871 ret=ETIMEDOUT
What we see here is 2165 doing an UNLOCK_PI, assigning the lock to the
pending 2161 which then returns with -ETIMEDOUT.
That wrecks the lock state, because now the owner isn't aware it
acquired the lock and we remove the pending robust list entry.
If we then kill 2161, the robust list will not clear out this futex and
the subsequent acquire on this futex will then (correctly) result in
-ESRCH which is unexpected by glibc which then triggers an internal
assertion and dies.
Thomas spotted the bug in rt_mutex_cleanup_proxy_lock(). I had
overlooked that mark_wakeup_next_waiter() results in !rt_mutex_owner()
instead of it pointing to the (next) task. This is to allow lock
stealing.
This means that we need to do a try_to_take_rt_mutex() call in order to
set rt_mutex_owner() before we test to see if its ours.
While there, fix what looks like a merge error which resulted in
rt_mutex_cleanup_proxy_lock() having two calls to
fixup_rt_mutex_waiters() and rt_mutex_wait_proxy_lock() not having any.
Both should have one, since both potentially touch the waiter list.
Reported-by: Markus Trippelsdorf <markus@xxxxxxxxxxxxxxx>
Fixes: 38d589f2fd08 ("futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock()")
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
---
kernel/locking/rtmutex.c | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index b95509416909..28cd09e635ed 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1785,12 +1785,14 @@ int rt_mutex_wait_proxy_lock(struct rt_mutex *lock,
int ret;
raw_spin_lock_irq(&lock->wait_lock);
-
- set_current_state(TASK_INTERRUPTIBLE);
-
/* sleep on the mutex */
+ set_current_state(TASK_INTERRUPTIBLE);
ret = __rt_mutex_slowlock(lock, TASK_INTERRUPTIBLE, to, waiter);
-
+ /*
+ * try_to_take_rt_mutex() sets the waiter bit unconditionally. We might
+ * have to fix that up.
+ */
+ fixup_rt_mutex_waiters(lock);
raw_spin_unlock_irq(&lock->wait_lock);
return ret;
@@ -1822,15 +1824,25 @@ bool rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock,
raw_spin_lock_irq(&lock->wait_lock);
/*
+ * Do an unconditional try-lock, this deals with the lock stealing
+ * state where __rt_mutex_futex_unlock() -> mark_wakeup_next_waiter()
+ * sets a NULL owner.
+ *
+ * We're not interested in the return value, because the subsequent
+ * test on rt_mutex_owner() will infer that. If the trylock succeeded,
+ * we will own the lock and it will have removed the waiter. If we
+ * failed the trylock, we're still not owner and we need to remove
+ * ourselves.
+ */
+ try_to_take_rt_mutex(lock, current, waiter);
+ /*
* Unless we're the owner; we're still enqueued on the wait_list.
* So check if we became owner, if not, take us off the wait_list.
*/
if (rt_mutex_owner(lock) != current) {
remove_waiter(lock, waiter);
- fixup_rt_mutex_waiters(lock);
cleanup = true;
}
-
/*
* try_to_take_rt_mutex() sets the waiter bit unconditionally. We might
* have to fix that up.