Re: [PATCH] sched: change pulling RT task to be pulling thehighest-prio run-queue first

From: Hillf Danton
Date: Fri Jun 03 2011 - 11:11:38 EST

On Tue, May 31, 2011 at 11:00 PM, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> On Sat, 2011-05-28 at 22:34 +0800, Hillf Danton wrote:
>> When pulling, RT tasks are pulled from one overloaded run-queue after another,
>> which is changed to be pulling tasks from the highest-prio run-queue first.
> First off, a change like this requires rational. Preferably, in the
> showing of benchmarks, and test cases that demonstrate the problems of
> the current scheduler and explains to us that these changes improve the
> situation.
> There is no rational nor any benchmarks that explain why this is better
> than the current method.

Hi Steven

Thanks for your review, which shows the shortage of the patch, test case.

>> A new function, cpupri_find_prio(), is added to easy pulling in prio sequence.
>> Signed-off-by: Hillf Danton <dhillf@xxxxxxxxx>
>> ---
>> --- tip-git/kernel/sched_rt.c Sun May 22 20:12:01 2011
>> +++ sched_rt.c    ÂSat May 28 21:24:13 2011
>> @@ -1434,18 +1434,33 @@ static void push_rt_tasks(struct rq *rq)
>> Â Â Â Â Â Â Â ;
>> Â}
>> +static DEFINE_PER_CPU(cpumask_var_t, high_cpu_mask);
>> +
>> Âstatic int pull_rt_task(struct rq *this_rq)
>> Â{
>> Â Â Â int this_cpu = this_rq->cpu, ret = 0, cpu;
>> Â Â Â struct task_struct *p;
>> Â Â Â struct rq *src_rq;
>> + Â Â struct cpumask *high_mask = __get_cpu_var(high_cpu_mask);
>> + Â Â int prio = 0;
>> Â Â Â if (likely(!rt_overloaded(this_rq)))
>> Â Â Â Â Â Â Â return 0;
>> +loop:
>> + Â Â if (! (prio < this_rq->rt.highest_prio.curr))
>> + Â Â Â Â Â Â return ret;
>> +
>> + Â Â if (! cpupri_find_prio(&this_rq->rd->cpupri, prio,
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â this_rq->rd->rto_mask, high_mask)) {
>> + Â Â Â Â Â Â prio++;
>> + Â Â Â Â Â Â goto loop;
>> + Â Â }
> This loop looks to be expensive in the hot path.

You are right, the introduced overhead in worse cases is
this_rq->rt.highest_prio.curr times bit-test like

if (cp->pri_active[task_prio / BITS_PER_LONG] &
(1UL << ((BITS_PER_LONG - 1) - (task_prio % BITS_PER_LONG)))) {

which I think slowdowns the hot patch a lot:/

> Note, in practice, not many RT tasks are running at the same time. If
> this is not the case, then please explain what situation has multiple RT
> tasks contending for more than one CPU where RT tasks are forced to
> migrate continuously, and this patch fixes the situation.

The situation is hard to be constructed, I guess it is only captured by

> I understand that the current code looks a bit expensive, as it loops
> through the CPUs that are overloaded, and pulls over the RT tasks
> waiting to run that are of higher priority than the one currently on
> this task. If it picks wrong, it could potentially pull over more than
> one task.
> But in practice (and I've traced this a while back), it seldom ever
> happens.
> But if you see that this code is hitting the slow path constantly, and
> your code shows better performance, and you can demonstrate this via a
> benchmark that I could use to reproduce, then I will consider taking
> these changes.

Since you already traced, the hitting could not happen, I believe.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at