Re: Questions about transition latency and LATENCY_MULTIPLIER

From: Viresh Kumar
Date: Wed May 29 2024 - 03:10:01 EST


HI Qais,

On 28-05-24, 02:21, Qais Yousef wrote:
> Hi
>
> I am trying to understanding the reason behind the usage of LATENCY_MULTIPLIER
> to create transition_delay_us. It is set to 1000 by default and when I tried to
> dig into the history I couldn't reach the original commit as the code has gone
> through many transformations and I gave up finding the first commit that
> introduced it.

The changes came along with the initial commits for conservative and ondemand
governors, i.e. before 2005.

> Generally I am seeing that rate_limit_us in schedutil (which is largely
> influenced by this multiplier on most/all systems I am working on) is too high
> compared to the cpuinfo_transition_latency reported by the driver
>
> For example on my M1 mac mini I get 50 and 56us. rate_limit_us is 10ms (on 6.8
> kernel, should become 2ms after my fix)
>
> $ grep . /sys/devices/system/cpu/cpufreq/policy*/cpuinfo_transition_latency
> /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_transition_latency:50000
> /sys/devices/system/cpu/cpufreq/policy4/cpuinfo_transition_latency:56000
>
> AMD Ryzen it reads 0, and end up with LATENCY_MULTIPLIER (1000 = 1ms) as
> the rate_limit_us.
>
> On Intel I5 I get 20us but rate_limit is 5ms which is requested explicitly by
> intel_pstate driver
>
> $ grep . /sys/devices/system/cpu/cpufreq/policy*/cpuinfo_transition_latency
> /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_transition_latency:20000
> /sys/devices/system/cpu/cpufreq/policy1/cpuinfo_transition_latency:20000
> /sys/devices/system/cpu/cpufreq/policy2/cpuinfo_transition_latency:20000
> /sys/devices/system/cpu/cpufreq/policy3/cpuinfo_transition_latency:20000
> /sys/devices/system/cpu/cpufreq/policy4/cpuinfo_transition_latency:20000
> /sys/devices/system/cpu/cpufreq/policy5/cpuinfo_transition_latency:20000
> /sys/devices/system/cpu/cpufreq/policy6/cpuinfo_transition_latency:20000
> /sys/devices/system/cpu/cpufreq/policy7/cpuinfo_transition_latency:20000
>
> The question I have is that why so high? If hardware got so good, why can't we
> leverage the hardware's fast ability to change frequencies more often?

>From my understanding, this is about not changing the frequency too often.
That's all. And it was historical and probably we didn't get better numbers with
this reduced to a lower value later on as well.

> This is important because due to uclamp usage, we can end up with less gradual
> transition between frequencies and we can jump up and down more often. And the
> smaller this value is, this means the better we can handle fast transition to
> boost or cap frequencies based on task's requirements when it context switches.
> But the rate limit generally is too high for the hardware and wanted to
> understand if this is pure historical or we still have reasons to worry about?

Maybe Rafael knows other reasons, but this is all I remember.

> From what I've seen so far, it seems to me this higher rate limit is helping
> performance as bursty tasks are more likely to find the CPU running at higher
> frequencies due to this behavior. I think this is something I can help these
> bursty tasks with without relying accidentally on this being higher.
>
> Is there any worry on using cpuinfo_transition_latency as is if the driver
> doesn't provide transition_delay_us?

Won't we keep changing the frequency continuously in that case ? Or am I
misunderstanding something ?

> And does the kernel/driver contract need to cater for errors in driver's
> ability to serve the request? Can our request silently be ignored by the
> hardware?

cpufreq core maintains its state machine and the failures are used to inform the
user and / or stop DVFS. It is useful for a clean approach, not sure what we
will get / miss by ignoring the errors..

> Not necessarily due to rate limit being ignored, but for any other
> reason? It is important for Linux to know what frequency we're actually running
> at.

One is that we report to userspace two frequencies:
- scaling_cur_freq: The frequency that the software thinks the hardware runs at
(last requested freq i.e.)

- cpuinfo_cur_freq: The real frequency hardware is running at. Can be calculated
using counters, etc.

And there will be tools which are using them. So these are required.

> Some hardware gives the ability to read a counter to discover that. But
> a lot of systems rely on the fact that the request we sent is actually
> honoured. But failures can mean things like EAS will misbehave.

--
viresh