Re: [RFC][PATCH] Fix a race between rwsem and the scheduler

From: Peter Zijlstra
Date: Tue Aug 30 2016 - 08:19:46 EST


On Tue, Aug 30, 2016 at 06:49:37PM +1000, Balbir Singh wrote:
>
>
> The origin of the issue I've seen seems to be related to
> rwsem spin lock stealing. Basically I see the system deadlock'd in the
> following state

As Nick says (good to see you're back Nick!), this is unrelated to
rwsems.

This is true for pretty much every blocking wait loop out there, they
all do:

for (;;) {
current->state = UNINTERRUPTIBLE;
smp_mb();
if (cond)
break;
schedule();
}
current->state = RUNNING;

Which, if the wakeup is spurious, is just the pattern you need.

> +++ b/kernel/sched/core.c
> @@ -2016,6 +2016,17 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> success = 1; /* we're going to change ->state */
> cpu = task_cpu(p);
>
> + /*
> + * Ensure we see on_rq and p_state consistently
> + *
> + * For example in __rwsem_down_write_failed(), we have
> + * [S] ->on_rq = 1 [L] ->state
> + * MB RMB

There isn't an MB there. The best I can do is UNLOCK+LOCK, which, thanks
to PPC, is _not_ MB. It is however sufficient for this case.

> + * [S] ->state = TASK_UNINTERRUPTIBLE [L] ->on_rq
> + * In the absence of the RMB p->on_rq can be observed to be 0
> + * and we end up spinning indefinitely in while (p->on_cpu)
> + */


/*
* Ensure we load p->on_rq _after_ p->state, otherwise it would
* be possible to, falsely, observe p->on_rq == 0 and get stuck
* in smp_cond_load_acquire() below.
*
* sched_ttwu_pending() try_to_wake_up()
* [S] p->on_rq = 1; [L] P->state
* UNLOCK rq->lock
*
* schedule() RMB
* LOCK rq->lock
* UNLOCK rq->lock
*
* [task p]
* [S] p->state = UNINTERRUPTIBLE [L] p->on_rq
*
* Pairs with the UNLOCK+LOCK on rq->lock from the
* last wakeup of our task and the schedule that got our task
* current.
*/

> + smp_rmb();
> if (p->on_rq && ttwu_remote(p, wake_flags))
> goto stat;
>


Now, this has been present for a fair while, I suspect ever since we
reworked the wakeup path to not use rq->lock twice. Curious you only now
hit it.