On Tue, 26 Sep 2017 09:23:20 +0800
zhouchengming<zhouchengming1@xxxxxxxxxx> wrote:
On 2017/9/26 3:40, Steven Rostedt wrote:Ah crap. This is caused by 7608dec2ce20 ("sched: Drop the rq argument
On Mon, 11 Sep 2017 14:51:49 +0800Thanks for the reply!
Zhou Chengming<zhouchengming1@xxxxxxxxxx> wrote:
push_rt_task() pick the first pushable task and find an eligibleHow can this happen? The try_to_wake_up(task_A) needs to grab the rq
lowest_rq, then double_lock_balance(rq, lowest_rq). So if
double_lock_balance() unlock the rq (when double_lock_balance() return 1),
we have to check if this task is still on the rq.
The problem is that the check conditions are not sufficient:
if (unlikely(task_rq(task) != rq ||
!cpumask_test_cpu(lowest_rq->cpu,&task->cpus_allowed) ||
task_running(rq, task) ||
!rt_task(task) ||
!task_on_rq_queued(task))) {
cpu2 cpu1 cpu0
push_rt_task(rq1)
pick task_A on rq1
find rq0
double_lock_balance(rq1, rq0)
unlock(rq1)
rq1 __schedule
pick task_A run
task_A sleep (dequeued)
lock(rq0)
lock(rq1)
do_above_check(task_A)
task_rq(task_A) == rq1
cpus_allowed unchanged
task_running == false
rt_task(task_A) == true
try_to_wake_up(task_A)
select_cpu = cpu3
enqueue(rq3, task_A)
that task A is on, and we have that rq lock.
/me confused.
-- Steve
After the task_A sleep on cpu1, the try_to_wake_up(task_A) on cpu0 select a different cpu3,
so it will grab the rq3 lock, not the rq1 lock.
to sched_class::select_task_rq()"). Because this code depends on
try_to_wake_up() grabbing the task's rq lock. But it no longer does
that, and it causes this race.
OK, I need to look at this deeper when I'm not so jetlagged and typing
this because I can't sleep at 5am.
Thanks for pointing this out!
It may be fixed by simply grabbing the run queue lock on migration, as
that would sync things up.
Peter?
-- Steve
.