Re: [PATCH] futex: robust futex maybe never be awaked, on rare situation.

From: Thomas Gleixner
Date: Mon Sep 23 2019 - 10:09:44 EST


On Mon, 23 Sep 2019, Peter Zijlstra wrote:
> On Mon, Sep 23, 2019 at 11:18:20AM +0800, Yunfeng Cui wrote:
> > I use model checker find a issue of robust and pi futex. On below
> > situation, the owner can't find something in pi_state_list, while
> > the requester will be blocked, never be awaked.
> >
> > CPU0 CPU1
> > futex_lock_pi
> > /*some cs code*/
> > futex_lock_pi
> > futex_lock_pi_atomic
> > ...
> > newval = uval | FUTEX_WAITERS;
> > ret = lock_pi_update_atomic(uaddr, uval, newval);
> > ...
> > attach_to_pi_owner
> > ....
> > p = find_get_task_by_vpid(pid);
> > if (!p)
> > return handle_exit_race(uaddr, uval, NULL);
> > ....
> > raw_spin_lock_irq(&p->pi_lock);
> > ....
> > pi_state = alloc_pi_state();
> > ....
> > do_exit->mm_release
> > if (unlikely(tsk->robust_list)) {
> > exit_robust_list(tsk);
> > tsk->robust_list = NULL;
> > }
> > if (unlikely(!list_empty(&tsk->pi_state_list)))
> > exit_pi_state_list(tsk); /*WILL MISS*/
> > list_add(&pi_state->list, &p->pi_state_list);
> > WILL BLOCKED, NEVER WAKEUP!
>
> Did you forget/overlook the pi_lock fiddling in do_exit() ? I'm thinking
> that would make the above impossible.

Right. I was trying to construct a case which allows the above, but failed
to do so.

Let's look at the exiting task:

exit()
exit_signals()
tsk->flags |= PF_EXITING;

smp_mb();

raw_spin_lock_irq(&tsk->pi_lock);
(1)
raw_spin_unlock_irq(&tsk->pi_lock);

exit_mm()
mm_release()
exit_robust_list()

if (!list_empty(&tsk->pi_state_list)))
exit_pi_state_list(tsk);

And now at the attaching task:

attach_to_pi_owner()
raw_spin_lock_irq(tsk->pi_lock);
if (tsk->flags & PF_EXITING)
return;

pi_state = alloc_pi_state()

list_add(pi_state, tsk->pi_state_list);

See (1) above. That's the crucial point.

Once the exiting task has set PF_EXITING and acquired tsk->pi_lock, it is
impossible for the attaching task to queue itself as it _must_ observe
PF_EXITING after it acquired tsk->pi_lock.

If it manages to acquire tsk->pi_lock _before_ the existing task does that,
then it either observes PF_EXITING or not.

If it does, it goes out. If it does not, it queues itself on
tsk->pi_state_list and will be cleaned up by the exiting task.

Simplified concurrency picture:

Case 1: Attacher does not see PF_EXITING

CPU 0 CPU 1

lock(&tsk->pi_lock);
tsk->flags |= PF_EXITING; if (!(tsk->flags & PF_EXITING))
queue(pi_state, tsk);
smp_mb(); unlock(&tsk->pi_lock);
lock(&tsk->pi_lock);
(1)
unlock(&tsk->pi_lock);

if (!list_empty(&tsk->pi_state_list)))
exit_pi_state_list(tsk);

Case 2: Attacher does see PF_EXITING before (1)

CPU 0 CPU 1

lock(&tsk->pi_lock);
tsk->flags |= PF_EXITING; if (tsk->flags & PF_EXITING) {
unlock(&tsk->pi_lock);
smp_mb(); return;
lock(&tsk->pi_lock); }
(1)
unlock(&tsk->pi_lock);

The attacher CANNOT be queued in tsk->pi_state_list

Case 2: Attacher does see PF_EXITING after (1)

CPU 0 CPU 1

tsk->flags |= PF_EXITING;
smp_mb();
lock(&tsk->pi_lock);
lock(&tsk->pi_lock);
(1)
unlock(&tsk->pi_lock);
if (tsk->flags & PF_EXITING) {
unlock(&tsk->pi_lock);
return;
}

There are no other cases possible. If the attacher can observe

!(tsk->flags & PF_EXITING)

_after_ (1) then there is something seriously wrong, but not in the futex
code. That would be a massive memory ordering issue.

Thanks,

tglx