Re: [PATCH v2] sched: rt: Make RT capacity aware

From: Qais Yousef
Date: Wed Oct 30 2019 - 13:43:25 EST


On 10/30/19 12:57, Dietmar Eggemann wrote:
> On 09.10.19 12:46, Qais Yousef wrote:
>
> [...]
>
> > Changes in v2:
> > - Use cpupri_find() to check the fitness of the task instead of
> > sprinkling find_lowest_rq() with several checks of
> > rt_task_fits_capacity().
> >
> > The selected implementation opted to pass the fitness function as an
> > argument rather than call rt_task_fits_capacity() capacity which is
> > a cleaner to keep the logical separation of the 2 modules; but it
> > means the compiler has less room to optimize rt_task_fits_capacity()
> > out when it's a constant value.
>
> I would prefer exporting rt_task_fits_capacity() sched-internally via
> kernel/sched/sched.h. Less code changes and the indication whether
> rt_task_fits_capacity() has to be used in cpupri_find() is already given
> by lowest_mask being !NULL or NULL.
>

I don't mind if the maintainers agree too. The reason I did that way is because
it keeps the implementation of cpupri generic and self contained.

rt_task_fits_capacity() at the moment is a static function in rt.c
To use it in cpupri_find() I either need to make it public somewhere or have an

extern bool rt_task_fits_capacity(...);

in cpupri.c. Neither of which appealed to me personally.

> [...]
>
> > +inline bool rt_task_fits_capacity(struct task_struct *p, int cpu)
> > +{
> > + unsigned int min_cap;
> > + unsigned int max_cap;
> > + unsigned int cpu_cap;
>
> Nit picking. Since we discussed it already,
>
> I found this "Also please try to aggregate variables of the same type
> into a single line. There is no point in wasting screen space::" ;-)
>
> https://lore.kernel.org/r/20181107171149.165693799@xxxxxxxxxxxxx

That wasn't merged at the end AFAICT :)

It's not my preferred style in this case if I have the choice - but I promise
to change it if I ended up having to spin another version anyway :)

>
> [...]
>
> > @@ -2223,7 +2273,10 @@ static void switched_to_rt(struct rq *rq, struct task_struct *p)
> > */
> > if (task_on_rq_queued(p) && rq->curr != p) {
> > #ifdef CONFIG_SMP
> > - if (p->nr_cpus_allowed > 1 && rq->rt.overloaded)
> > + bool need_to_push = rq->rt.overloaded ||
> > + !rt_task_fits_capacity(p, cpu_of(rq));
> > +
> > + if (p->nr_cpus_allowed > 1 && need_to_push)
> > rt_queue_push_tasks(rq);
> > #endif /* CONFIG_SMP */
> > if (p->prio < rq->curr->prio && cpu_online(cpu_of(rq)))
> What happens to a always running CFS task which switches to RT? Luca
> introduced a special migrate callback (next to push and pull)
> specifically to deal with this scenario. A lot of new infrastructure for
> this one use case, but still, do we care for it in RT as well?
>
> https://lore.kernel.org/r/20190506044836.2914-4-luca.abeni@xxxxxxxxxxxxxxx
>

Good question. This scenario and the one where uclamp values are changed while
an RT task is running are similar.

In both cases the migration will happen on the next activation/wakeup AFAICS.

I am not sure an always running rt/deadline task is something conceivable in
practice and we want to cater for. It is certainly something we can push a fix
for in the future on top of this. IMHO we're better off trying to keep the
complexity low until we have a real scenario/use case that justifies it.

As it stands when the system is overloaded or when there are more RT tasks than
big cores we're stuffed too. And there are probably more ways where we can
shoot ourselves in the foot. Using RT and deadline is hard and that's one of
their unavoidable plagues I suppose.

Thanks

--
Qais Yousef