Re: [PATCH v2] sched: rt: Make RT capacity aware

From: Qais Yousef
Date: Tue Oct 29 2019 - 07:02:30 EST


On 10/29/19 09:13, Vincent Guittot wrote:
> On Wed, 9 Oct 2019 at 12:46, Qais Yousef <qais.yousef@xxxxxxx> wrote:
> >
> > Capacity Awareness refers to the fact that on heterogeneous systems
> > (like Arm big.LITTLE), the capacity of the CPUs is not uniform, hence
> > when placing tasks we need to be aware of this difference of CPU
> > capacities.
> >
> > In such scenarios we want to ensure that the selected CPU has enough
> > capacity to meet the requirement of the running task. Enough capacity
> > means here that capacity_orig_of(cpu) >= task.requirement.
> >
> > The definition of task.requirement is dependent on the scheduling class.
> >
> > For CFS, utilization is used to select a CPU that has >= capacity value
> > than the cfs_task.util.
> >
> > capacity_orig_of(cpu) >= cfs_task.util
> >
> > DL isn't capacity aware at the moment but can make use of the bandwidth
> > reservation to implement that in a similar manner CFS uses utilization.
> > The following patchset implements that:
> >
> > https://lore.kernel.org/lkml/20190506044836.2914-1-luca.abeni@xxxxxxxxxxxxxxx/
> >
> > capacity_orig_of(cpu)/SCHED_CAPACITY >= dl_deadline/dl_runtime
> >
> > For RT we don't have a per task utilization signal and we lack any
> > information in general about what performance requirement the RT task
> > needs. But with the introduction of uclamp, RT tasks can now control
> > that by setting uclamp_min to guarantee a minimum performance point.
> >
> > ATM the uclamp value are only used for frequency selection; but on
> > heterogeneous systems this is not enough and we need to ensure that the
> > capacity of the CPU is >= uclamp_min. Which is what implemented here.
> >
> > capacity_orig_of(cpu) >= rt_task.uclamp_min
> >
> > Note that by default uclamp.min is 1024, which means that RT tasks will
> > always be biased towards the big CPUs, which make for a better more
> > predictable behavior for the default case.
>
> hmm... big cores are not always the best choices for rt tasks, they
> generally took more time to wake up or to switch context because of
> the pipeline depth and others branch predictions

Can you quantify this into a number? I suspect this latency should be in the
200-300us range. And the difference between little and big should be much
smaller than that, no? We can't give guarantees in Linux in that order in
general and for serious real time users they have to do extra tweaks like
disabling power management which can introduce latency and hinder determinism.
Beside enabling PREEMPT_RT.

For generic systems a few ms is the best we can give and we can easily fall out
of this without any tweaks.

The choice of going to the maximum performance point in the system for RT tasks
by default goes beyond this patch anyway. I'm just making it consistent here
since we have different performance levels and RT didn't understand this
before.

So what I'm doing here is just make things consistent rather than change the
default.

What do you suggest?

Thanks

--
Qais Yousef