Re: [RFC][PATCH 4/4] futex: Rewrite FUTEX_UNLOCK_PI

From: Peter Zijlstra
Date: Fri Nov 25 2016 - 04:24:06 EST


On Thu, Nov 24, 2016 at 07:58:07PM +0100, Peter Zijlstra wrote:

> OK, so clearly I'm confused. So let me try again.
>
> LOCK_PI, does in one function: lookup_pi_state, and fixup_owner. If
> fixup_owner fails with -EAGAIN, we can redo the pi_state lookup.
>
> The requeue stuff, otoh, has one each. REQUEUE_WAIT has fixup_owner(),
> CMP_REQUEUE has lookup_pi_state. Therefore, fixup_owner failing with
> -EAGAIN leaves us dead in the water. There's nothing to go back to to
> retry.
>
> So far, so 'good', right?
>
> Now, as far as I understand this requeue stuff, we have 2 futexes, an
> inner futex and an outer futex. The inner futex is always 'locked' and
> serves as a collection pool for waiting threads.
>
> The requeue crap picks one (or more) waiters from the inner futex and
> sticks them on the outer futex, which gives them a chance to run.
>
> So WAIT_REQUEUE blocks on the inner futex, but knows that if it ever
> gets woken, it will be on the outer futex, and hence needs to
> fixup_owner if the futex and rt_mutex state got out of sync.
>
> CMP_REQUEUEUEUE picks the one (or more) waiters of the inner futex and
> sticks them on the outer futex.
>
> So far, so 'good' ?
>
> The thing I'm not entire sure on is what happens with the outer futex,
> do we first LOCK_PI it before doing CMP_REQUEUE, giving us waiters, and
> then UNLOCK_PI to let them rip? Or do we just CMP_REQUEUE and then let
> whoever wins finish with UNLOCK_PI?
>
>
> In any case, I don't think it matters much, either way we can race
> betwen the 'last' UNLOCK_PI and getting rt_mutex waiters and then hit
> the &init_task funny state, such that WAIT_REQUEUE waking hits EAGAIN
> and we're 'stuck'.
>
> Now, if we always CMP_REQUEUE to a locked outer futex, then we cannot
> know, at CMP_REQUEUE time, who will win and cannot fix up.

OTOH, if we always first LOCK_PI before doing CMP_REQUEUE, I don't think
we can hit the funny state, LOCK_PI will have fixed it up for us.

So the question is, do we mandate LOCK_PI before CMP_REQUEUE?