Re: [PATCH 2/3] sched/rt: allow pulling unfitting task

From: Qais Yousef
Date: Fri Feb 21 2020 - 06:08:25 EST


On 02/21/20 13:37, Pavan Kondeti wrote:
> Hi Quais,

I know my name breaks the English rules, but there's no u after my Q :)

>
> On Wed, Feb 19, 2020 at 01:43:08PM +0000, Qais Yousef wrote:
>
> [...]
>
> > >
> > > Here rq is source rq from which the task is being pulled. I can't understand
> > > how marking overload condition on source_rq help. Because overload condition
> > > gets cleared in the task dequeue path. i.e dec_rt_tasks->dec_rt_migration->
> > > update_rt_migration().
> > >
> > > Also, the overload condition with nr_running=1 may not work as expected unless
> > > we track this overload condition (due to unfit) separately. Because a task
> > > can be pushed only when it is NOT running. So a task running on silver will
> > > continue to run there until it wakes up next time or another high prio task
> > > gets queued here (due to affinity).
> > >
> > > btw, Are you testing this path by disabling RT_PUSH_IPI feature? I ask this
> > > because, This feature gets turned on by default in our b.L platforms and
> > > RT task migrations happens by the busy CPU pushing the tasks. Or are there
> > > any cases where we can run into pick_rt_task() even when RT_PUSH_IPI is
> > > enabled?
> >
> > I changed the approach to set the overload at wake up now. I think I got away
> > without having to encode the reason in rq->rt.overload.
> >
> > Steve, Pavan, if you can scrutinize this approach I'd be appreciated. It seemed
> > to work fine with my testing.
> >
>
> The 1st patch in this series added the support to return an unfit CPU incase

If you can give your reviewed-by and maybe tested-by for that patch that'd be
great.

> all other BIG CPUs are busy with RT tasks. I believe that change it self
> solves the RT tasks spreading problem if we just remove
> rt_task_fits_capacity() from pick_rt_task(). In fact rt_task_fits_capacity()

Yes I will be removing this.

> can also be removed from switched_to_rt() and task_woken_rt(). Because
> a RT task can be pulled/pushed only if it not currently running. If the

I did actually consider pushing a patch that does just that.

The issue is that I have seen delays in migrating the task to the big CPU,
although the CPU was free. It was hard to hit, but I had runs where the
migration didn't happen at all during the 1s test duration.

I need to use the same overloaded trick in switched_to_rt() too though.

> intention is to push/pull a running RT task from an unfit CPU to a BIG CPU,
> that won't work as we are not waking any migration/X to carry the migration.

Hmm, I thought this should happen as a side effect of the push/pull functions.

>
> If an CPU has 2 RT tasks (excluding the running RT task), then we have a
> choice about which RT task to be pushed based on the destination CPU's
> capacity. Are we trying to solve this problem in this patch?

If there are 2 tasks on the same CPU (rq), then it's overloaded and the push
should be triggered anyway, no?

What I'm trying to solve is if a task has woken up on the wrong CPU, or has
just switched to RT on the wrong CPU, then we want to push ASAP.

We can rely on select_task_rq_rt() to do the work, but I have seen big delays
for this to trigger.

>
> Please see the below comments as well and let me know if we are on the
> same page or not.
>
> >
> >
> > When implemented RT Capacity Awareness; the logic was done such that if
> > a task was running on a fitting CPU, then it was sticky and we would try
> > our best to keep it there.
> >
> > But as Steve suggested, to adhere to the strict priority rules of RT
> > class; allow pulling an RT task to unfitting CPU to ensure it gets a
> > chance to run ASAP.
> >
> > To better handle the fact the task is running on unfit CPU, when it
> > wakes up mark it as overloaded which will cause it to be pushed to
> > a fitting CPU when it becomes available.
> >
> > The latter change requires teaching push_rt_task() how to handle pushing
> > unfit task.
> >
> > If the unfit task is the only pushable task, then we only force the push
> > if we find a fitting CPU. Otherwise we bail out.
> >
> > Else if the task is higher priorirty than current task, then we
> > reschedule.
> >
> > Else if the rq has other pushable tasks, then we push the unfitting task
> > anyway to reduce the pressure on the rq even if the target CPU is unfit
> > too.
> >
> > Suggested-by: Steven Rostedt <rostedt@xxxxxxxxxxx>
> > Signed-off-by: Qais Yousef <qais.yousef@xxxxxxx>
> > ---
> > kernel/sched/rt.c | 52 +++++++++++++++++++++++++++++++++++++++++++----
> > 1 file changed, 48 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > index 6d959be4bba0..6d92219d5733 100644
> > --- a/kernel/sched/rt.c
> > +++ b/kernel/sched/rt.c
> > @@ -1658,8 +1658,7 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
> > static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu)
> > {
> > if (!task_running(rq, p) &&
> > - cpumask_test_cpu(cpu, p->cpus_ptr) &&
> > - rt_task_fits_capacity(p, cpu))
> > + cpumask_test_cpu(cpu, p->cpus_ptr))
> > return 1;
> >
>
> Yes. We come here means rq has more than 1 runnable task, so we prefer
> spreading.

I might put this change only for this patch and send the fixes so they can go
before 5.6 is released. Unless the current patch ends up needing few tweaks
only.

>
> > return 0;
> > @@ -1860,6 +1859,7 @@ static int push_rt_task(struct rq *rq)
> > struct task_struct *next_task;
> > struct rq *lowest_rq;
> > int ret = 0;
> > + bool fit;
> >
> > if (!rq->rt.overloaded)
> > return 0;
> > @@ -1872,12 +1872,21 @@ static int push_rt_task(struct rq *rq)
> > if (WARN_ON(next_task == rq->curr))
> > return 0;
> >
> > + /*
> > + * The rq could be overloaded because it has unfitting task, if that's
> > + * the case they we need to try harder to find a better fitting CPU.
> > + */
> > + fit = rt_task_fits_capacity(next_task, cpu_of(rq));
> > +
> > /*
> > * It's possible that the next_task slipped in of
> > * higher priority than current. If that's the case
> > * just reschedule current.
> > + *
> > + * Unless next_task doesn't fit in this cpu, then continue with the
> > + * attempt to push it.
> > */
> > - if (unlikely(next_task->prio < rq->curr->prio)) {
> > + if (unlikely(next_task->prio < rq->curr->prio && fit)) {
> > resched_curr(rq);
> > return 0;
> > }
> > @@ -1920,6 +1929,33 @@ static int push_rt_task(struct rq *rq)
> > goto retry;
> > }
> >
> > + /*
> > + * Bail out if the task doesn't fit on either CPUs.
> > + *
> > + * Unless..
> > + *
> > + * * The priority of next_task is higher than current, then we
> > + * resched_curr(). We forced skipping this condition above.
> > + *
> > + * * The rq has more tasks to push, then we probably should push anyway
> > + * reduce the load on this rq.
> > + */
> > + if (!fit && !rt_task_fits_capacity(next_task, cpu_of(lowest_rq))) {
> > +
> > + /* we forced skipping this condition, so re-evaluate it */
> > + if (unlikely(next_task->prio < rq->curr->prio)) {
> > + resched_curr(rq);
> > + goto out_unlock;
> > + }
> > +
> > + /*
> > + * If there are more tasks to push, then the rq is overloaded
> > + * with more than just this task, so push anyway.
> > + */
> > + if (has_pushable_tasks(rq))
> > + goto out_unlock;

Hmm, I accidently inverted the logic too before posting the patch. It should
have been (!has_pushable_tasks).

>
> The next_task is not yet removed from the pushable_tasks list, so this will
> always be true and we skip pushing. Even if you add some logic, what happens

I saw the loop in push_rt_task[s]() and thought pick_next_pushable_task() is
what dequeues..

Is there another way to know how many tasks are pushable? Or do I need a new
helper?

> if the other task also does not fit? How would the task finally be pushed
> to other CPUs?

If we had 2 unfitting tasks; we'll push one and keep one. If we had 2 tasks,
one unfitting and one fitting, then we'll still push 1. I think the dequeue
should have cleared the overloaded state after pushing the first task anyway.

Does this answer your question or were you trying to point out something else?

>
> > + }
> > +
> > deactivate_task(rq, next_task, 0);
> > set_task_cpu(next_task, lowest_rq->cpu);
> > activate_task(lowest_rq, next_task, 0);
> > @@ -1927,6 +1963,7 @@ static int push_rt_task(struct rq *rq)
> >
> > resched_curr(lowest_rq);
> >
> > +out_unlock:
> > double_unlock_balance(rq, lowest_rq);
> >
> > out:
> > @@ -2223,7 +2260,14 @@ static void task_woken_rt(struct rq *rq, struct task_struct *p)
> > (rq->curr->nr_cpus_allowed < 2 ||
> > rq->curr->prio <= p->prio);
> >
> > - if (need_to_push || !rt_task_fits_capacity(p, cpu_of(rq)))
> > + bool fit = rt_task_fits_capacity(p, cpu_of(rq));
> > +
> > + if (!fit && !rq->rt.overloaded) {
> > + rt_set_overload(rq);
> > + rq->rt.overloaded = 1;
> > + }
> > +
> > + if (need_to_push || !fit)
> > push_rt_tasks(rq);
> > }
>
> If this is the only RT task queued on this CPU and current task is not a RT
> task, why are we calling push_rt_tasks() in case if unfit scenario. In most
> cases, the task is woken on this rq, because there is no other higher capacity
> rq that can take this task for whatever reason.

But what if the big CPU is now free for it to run?

I need to add a similar condition to switched_to_rt() otherwise the
push_rt_tasks() would be a NOP.

>
> btw, if we set overload condition with just 1 RT task, we keep receiving the
> IPI (irq work) to push the task and we can't do that because a running RT
> task can't be pushed.

Good point.

We can teach the logic not to send IPIs in this case?

Given all of that. I might still agree this might be unnecessary work. So I'll
split it out into a separate patch anyway and send an updated series of the
fixes we agreed on to go into 5.6.

Thanks!

--
Qais Yousef