Re: [RFC][PATCH 5/5] sched: Reduce ttwu rq->lock contention

From: Oleg Nesterov
Date: Fri Dec 17 2010 - 12:01:49 EST


On 12/16, Peter Zijlstra wrote:
>
> It does the state and on_rq checks first, if we find on_rq,

The problem is, somehow we should check both on_rq and state
at the same time,

> +try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> {
> - int cpu, orig_cpu, this_cpu, success = 0;
> + int cpu, load, ret = 0;
> unsigned long flags;
> - unsigned long en_flags = ENQUEUE_WAKEUP;
> - struct rq *rq;
>
> - this_cpu = get_cpu();
> + smp_mb();

Yes, we need the full mb(). without subsequent spin_lock(), wmb()
can't act as a smp_store_load_barrier() (which we don't have).

> + if (p->se.on_rq && ttwu_force(p, state, wake_flags))
> + return 1;

----- WINDOW -----

> + for (;;) {
> + unsigned int task_state = p->state;
> +
> + if (!(task_state & state))
> + goto out;
> +
> + load = task_contributes_to_load(p);
> +
> + if (cmpxchg(&p->state, task_state, TASK_WAKING) == task_state)
> + break;

Suppose that we have a task T sleeping in TASK_INTERRUPTIBLE state,
and this cpu does try_to_wake_up(TASK_INTERRUPTIBLE). on_rq == false.
try_to_wake_up() starts the "for (;;)" loop.

However, in the WINDOW above, it is possible that somebody else wakes
it up, and then this task changes its state to TASK_INTERRUPTIBLE again.

Then we set ->state = TASK_WAKING, but this (still running) T restores
TASK_RUNNING after us.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/