Re: [RFC PATCH 0/7] sched: cpufreq: Remove magic margins

From: Qais Yousef
Date: Fri Sep 08 2023 - 10:08:04 EST


On 09/08/23 09:40, Dietmar Eggemann wrote:
> On 08/09/2023 02:17, Qais Yousef wrote:
> > On 09/07/23 15:08, Peter Zijlstra wrote:
> >> On Mon, Aug 28, 2023 at 12:31:56AM +0100, Qais Yousef wrote:
>
> [...]
>
> > But for the 0.8 and 1.25 margin problems, actually the problem is that 25% is
> > too aggressive/fast and wastes power. I'm actually slowing things down as
> > a result of this series. And I'm expecting some not to be happy about it on
> > their systems. The response_time_ms was my way to give back control. I didn't
> > see how I can make things faster and slower at the same time without making
> > decisions on behalf of the user/sysadmin.
> >
> > So the connection I see between PELT and the margins or headrooms in
> > fits_capacity() and map_util_perf()/dvfs_headroom is that they expose the need
> > to manage the perf/power trade-off of the system.
> >
> > Particularly the default is not good for the modern systems, Cortex-X is too
> > powerful but we still operate within the same power and thermal budgets.
> >
> > And what was a high end A78 is a mid core today. So if you look at today's
> > mobile world topology we really have a tiy+big+huge combination of cores. The
> > bigs are called mids, but they're very capable. Fits capacity forces migration
> > to the 'huge' cores too soon with that 80% margin. While the 80% might be too
> > small for the tiny ones as some workloads really struggle there if they hang on
> > for too long. It doesn't help that these systems ship with 4ms tick. Something
> > more to consider changing I guess.
>
> If this is the problem then you could simply make the margin (headroom)
> a function of cpu_capacity_orig?

I don't see what you mean. instead of capacity_of() but keep the 80%?

Again, I could be delusional and misunderstanding everything, but what I really
see fits_capacity() is about is misfit detection. But a task is not really
misfit until it actually has a util above the capacity of the CPU. Now due to
implementation details there can be a delay between the task crossing this
capacity and being able to move it. Which what I believe this headroom is
trying to achieve.

I think we can better define this by tying this headroom to the worst case
scenario it takes to actually move this misfit task to the right CPU. If it can
continue to run without being impacted with this delay and crossing the
capacity of the CPU it is on, then we should not trigger misfit IMO.

>
> [...]
>
> > There's a question that I'm struggling with if I may ask. Why is it perceived
> > our constant response time (practically ~200ms to go from 0 to max) as a good
> > fit for all use cases? Capability of systems differs widely in terms of what
> > performance you get at say a util of 512. Or in other words how much work is
> > done in a unit of time differs between system, but we still represent that work
> > in a constant way. A task ran for 10ms on powerful System A would have done
>
> PELT (util_avg) is uarch & frequency invariant.
>
> So e.g. a task with util_avg = 256 could have a runtime/period
>
> on big CPU (capacity = 1024) of 4ms/16ms
>
> on little CPU (capacity = 512) of 8ms/16ms
>
> The amount of work in invariant (so we can compare between asymmetric
> capacity CPUs) but the runtime obviously differs according to the capacity.

Yep!


Cheers

--
Qais Yousef