Re: [RFC][PATCH 17/18] sched: Move the second half of ttwu() to theremote cpu

From: Frank Rowand
Date: Fri Jan 28 2011 - 19:05:06 EST

On 01/18/11 08:38, Peter Zijlstra wrote:
> On Fri, 2011-01-07 at 16:22 +0100, Oleg Nesterov wrote:

>> Doesn't __migrate_task() need pi_lock? Consider:

In the reply to this Peter suggests:

"keep ->pi_lock locked over the call to
stop_one_cpu() from set_cpus_allowed_ptr()"

Then Oleg replies to Peter with a possible problem to that.

If I understand Oleg's original suggestion, it solves the problem.
I inject the "pi_lock in __migrate_task()" in Oleg's original problem
description below, and how I think it fixes the problem:

>> 1. A task T runs on CPU_0, it does set_current_state(TASK_INTERRUBTIBLE)
>> 2. some CPU does set_cpus_allowed_ptr(T, new_mask), new_mask doesn't
>> include CPU_0.
>> T is running, cpumask_any_and() picks CPU_1, set_cpus_allowed_ptr()
>> drops pi_lock and rq->lock before stop_one_cpu().
>> 3. T calls schedule() and becomes deactivated.
>> 4. CPU_2 does try_to_wake_up(T, TASK_INTERRUPTIBLE), takes pi_lock
>> and sees on_rq == F.
>> 5. set_cpus_allowed_ptr() resumes and calls stop_one_cpu(cpu => 1).
>> 6. cpu_stopper_thread() runs on CPU_1 and calls ____migrate_task().

It attempts to lock p->pi_lock (and thus blocks on lock held by

So do not get to this double rq lock yet:
>> It locks CPU_0 and CPU_1 rq's and checks task_cpu() == src_cpu.

>> 7. CPU_2 calls select_task_rq(), it returns (to simplify) 2.
>> Now try_to_wake_up() does set_task_cpu(T, 2), and calls
>> ttwu_queue()->ttwu_do_activate()->activate_task().

7.1 Now __migrate_task() gets the p->pi_lock that it was blocked on,
continues on to get the double rq lock (the last part of step 6
that got postponed above), discovers that
(task_cpu(p) != src_cpu), and thus skips over the problematic
step 8:

>> 8. __migrate_task() on CPU_1 sees p->on_rq and starts the
>> deactivate/activate dance racing with ttwu_do_activate()
>> on CPU_2.
> Drad, yes I think you're right, now you've got me worried about the
> other migration paths too.. however did you come up with that
> scenario? :-)
> A simple fix would be to keep ->pi_lock locked over the call to
> stop_one_cpu() from set_cpus_allowed_ptr().
> I think the sched_fair.c load-balance code paths are ok because we only
> find a task to migrate after we've obtained both runqueue locks, so even
> if we migrate current, it cannot schedule (step 3).
> I'm not at all sure about the sched_rt load-balance paths, will need to
> twist my head around that..

I haven't yet tried to twist my head around either the sched_fair or the
sched_rt load balance paths. But wouldn't it just be safer (especially
given that the load balance code will be modified by somebody at some
point in the future, and that this locking complexity does require head
twisting) to just add the pi_lock in the load-balance paths also?


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at