Re: [PATCH] sched: fix bad task migration for rt tasks

From: Schspa Shi
Date: Sat Jun 25 2022 - 08:23:20 EST


Valentin Schneider <vschneid@xxxxxxxxxx> writes:

> On 24/06/22 02:29, Schspa Shi wrote:
>> @@ -1998,12 +1998,15 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
>> * the mean time, task could have
>> * migrated already or had its affinity changed.
>> * Also make sure that it wasn't scheduled on its rq.
>> + * It is possible the task has running for a while,
>> + * And we check task migration disable flag again.
>> */
>> if (unlikely(task_rq(task) != rq ||
>> !cpumask_test_cpu(lowest_rq->cpu, &task->cpus_mask) ||
>
> cf. 95158a89dd50 ("sched,rt: Use the full cpumask for balancing"), this
> made sense to me back then but not so much anymore... Shouldn't this have
> remained a ->cpus_ptr check?
>

It seems it is for the following scenarios:
It allows the higher priority non migratable task to be sched quickly
by migrating the current task to
another CPU with lower priority.

Considering this, we should retry for this. rather than return with no
lower priority rq.

I can upload a new patchset for this handing.

Please refers to the fellowing code:
if (is_migration_disabled(next_task)) {
struct task_struct *push_task = NULL;
int cpu;

if (!pull || rq->push_busy)
return 0;

/*
* Invoking find_lowest_rq() on anything but an RT task doesn't
* make sense. Per the above priority check, curr has to
* be of higher priority than next_task, so no need to
* reschedule when bailing out.
*
* Note that the stoppers are masqueraded as SCHED_FIFO
* (cf. sched_set_stop_task()), so we can't rely on rt_task().
*/
if (rq->curr->sched_class != &rt_sched_class)
return 0;

cpu = find_lowest_rq(rq->curr);
if (cpu == -1 || cpu == rq->cpu)
return 0;

/*
* Given we found a CPU with lower priority than @next_task,
* therefore it should be running. However we cannot migrate it
* to this other CPU, instead attempt to push the current
* running task on this CPU away.
*/
push_task = get_push_task(rq);
if (push_task) {
raw_spin_rq_unlock(rq);
stop_one_cpu_nowait(rq->cpu, push_cpu_stop,
push_task, &rq->push_work);
raw_spin_rq_lock(rq);
}

return 0;

}

> I'm going to revisit that commit, I evicted too much of it.
>
>> task_running(rq, task) ||
>> !rt_task(task) ||
>> - !task_on_rq_queued(task))) {
>> + !task_on_rq_queued(task) ||
>> + is_migration_disabled(task))) {
>>
>> double_unlock_balance(rq, lowest_rq);
>> lowest_rq = NULL;
>> --
>> 2.24.3 (Apple Git-128)

--

BRs
Schspa Shi