Questions about transition latency and LATENCY_MULTIPLIER
From: Qais Yousef
Date: Mon May 27 2024 - 21:21:29 EST
Hi
I am trying to understanding the reason behind the usage of LATENCY_MULTIPLIER
to create transition_delay_us. It is set to 1000 by default and when I tried to
dig into the history I couldn't reach the original commit as the code has gone
through many transformations and I gave up finding the first commit that
introduced it.
Generally I am seeing that rate_limit_us in schedutil (which is largely
influenced by this multiplier on most/all systems I am working on) is too high
compared to the cpuinfo_transition_latency reported by the driver
For example on my M1 mac mini I get 50 and 56us. rate_limit_us is 10ms (on 6.8
kernel, should become 2ms after my fix)
$ grep . /sys/devices/system/cpu/cpufreq/policy*/cpuinfo_transition_latency
/sys/devices/system/cpu/cpufreq/policy0/cpuinfo_transition_latency:50000
/sys/devices/system/cpu/cpufreq/policy4/cpuinfo_transition_latency:56000
AMD Ryzen it reads 0, and end up with LATENCY_MULTIPLIER (1000 = 1ms) as
the rate_limit_us.
On Intel I5 I get 20us but rate_limit is 5ms which is requested explicitly by
intel_pstate driver
$ grep . /sys/devices/system/cpu/cpufreq/policy*/cpuinfo_transition_latency
/sys/devices/system/cpu/cpufreq/policy0/cpuinfo_transition_latency:20000
/sys/devices/system/cpu/cpufreq/policy1/cpuinfo_transition_latency:20000
/sys/devices/system/cpu/cpufreq/policy2/cpuinfo_transition_latency:20000
/sys/devices/system/cpu/cpufreq/policy3/cpuinfo_transition_latency:20000
/sys/devices/system/cpu/cpufreq/policy4/cpuinfo_transition_latency:20000
/sys/devices/system/cpu/cpufreq/policy5/cpuinfo_transition_latency:20000
/sys/devices/system/cpu/cpufreq/policy6/cpuinfo_transition_latency:20000
/sys/devices/system/cpu/cpufreq/policy7/cpuinfo_transition_latency:20000
The question I have is that why so high? If hardware got so good, why can't we
leverage the hardware's fast ability to change frequencies more often?
This is important because due to uclamp usage, we can end up with less gradual
transition between frequencies and we can jump up and down more often. And the
smaller this value is, this means the better we can handle fast transition to
boost or cap frequencies based on task's requirements when it context switches.
But the rate limit generally is too high for the hardware and wanted to
understand if this is pure historical or we still have reasons to worry about?
>From what I've seen so far, it seems to me this higher rate limit is helping
performance as bursty tasks are more likely to find the CPU running at higher
frequencies due to this behavior. I think this is something I can help these
bursty tasks with without relying accidentally on this being higher.
Is there any worry on using cpuinfo_transition_latency as is if the driver
doesn't provide transition_delay_us?
And does the kernel/driver contract need to cater for errors in driver's
ability to serve the request? Can our request silently be ignored by the
hardware? Not necessarily due to rate limit being ignored, but for any other
reason? It is important for Linux to know what frequency we're actually running
at. Some hardware gives the ability to read a counter to discover that. But
a lot of systems rely on the fact that the request we sent is actually
honoured. But failures can mean things like EAS will misbehave.
Thanks!
--
Qais Yousef