Re: [RFC PATCH 0/7] sched: cpufreq: Remove magic margins

From: Dietmar Eggemann
Date: Fri Sep 08 2023 - 03:41:37 EST

Next message: Ankit Kumar: "[PATCH v2 1/2] block:t10-pi: remove redundant Type2 check during t10 PI verify"
Previous message: Jan Kara: "Re: [PATCH] fix writing to the filesystem after unmount"
Next in thread: Qais Yousef: "Re: [RFC PATCH 0/7] sched: cpufreq: Remove magic margins"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 08/09/2023 02:17, Qais Yousef wrote:
> On 09/07/23 15:08, Peter Zijlstra wrote:
>> On Mon, Aug 28, 2023 at 12:31:56AM +0100, Qais Yousef wrote:

[...]

> But for the 0.8 and 1.25 margin problems, actually the problem is that 25% is
> too aggressive/fast and wastes power. I'm actually slowing things down as
> a result of this series. And I'm expecting some not to be happy about it on
> their systems. The response_time_ms was my way to give back control. I didn't
> see how I can make things faster and slower at the same time without making
> decisions on behalf of the user/sysadmin.
>
> So the connection I see between PELT and the margins or headrooms in
> fits_capacity() and map_util_perf()/dvfs_headroom is that they expose the need
> to manage the perf/power trade-off of the system.
>
> Particularly the default is not good for the modern systems, Cortex-X is too
> powerful but we still operate within the same power and thermal budgets.
>
> And what was a high end A78 is a mid core today. So if you look at today's
> mobile world topology we really have a tiy+big+huge combination of cores. The
> bigs are called mids, but they're very capable. Fits capacity forces migration
> to the 'huge' cores too soon with that 80% margin. While the 80% might be too
> small for the tiny ones as some workloads really struggle there if they hang on
> for too long. It doesn't help that these systems ship with 4ms tick. Something
> more to consider changing I guess.

If this is the problem then you could simply make the margin (headroom)
a function of cpu_capacity_orig?

[...]

> There's a question that I'm struggling with if I may ask. Why is it perceived
> our constant response time (practically ~200ms to go from 0 to max) as a good
> fit for all use cases? Capability of systems differs widely in terms of what
> performance you get at say a util of 512. Or in other words how much work is
> done in a unit of time differs between system, but we still represent that work
> in a constant way. A task ran for 10ms on powerful System A would have done

PELT (util_avg) is uarch & frequency invariant.

So e.g. a task with util_avg = 256 could have a runtime/period

on big CPU (capacity = 1024) of 4ms/16ms

on little CPU (capacity = 512) of 8ms/16ms

The amount of work in invariant (so we can compare between asymmetric
capacity CPUs) but the runtime obviously differs according to the capacity.

[...]

Next message: Ankit Kumar: "[PATCH v2 1/2] block:t10-pi: remove redundant Type2 check during t10 PI verify"
Previous message: Jan Kara: "Re: [PATCH] fix writing to the filesystem after unmount"
Next in thread: Qais Yousef: "Re: [RFC PATCH 0/7] sched: cpufreq: Remove magic margins"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]