Re: [GIT PULL] sched_ext: Initial pull request for v6.11

From: Russell Haley
Date: Wed Jul 31 2024 - 22:51:03 EST


> We really shouldn't change how schedutil works. The governor is supposed to
> behave in a certain way, and we need to ensure consistency. I think you should
> look on how you make your scheduler compatible with it. Adding hooks to say
> apply this perf value that I want is a recipe for randomness.

If schedutil's behavior is perfect as-is, then why does cpu.uclamp.max
not work with values between 81-100%, which is the part of the CPU
frequency range where one pays the least in performance per Joule saved?
Why does cpu.uclamp.min have to be set all the way up and down the
cgroup hierarchy, from root to leaf, to actually affect frequency
selection? Why is sugov notorious for harming video encoding
performance[1], which is a CPU-saturating workload? Why do intel_pstate
and amd-pstate both bypass it on modern hardware?

It appears that without Android's very deeply integrated userspace
uclamp controls telling sugov what to do, it's native behavior is less
than awe-inspring. Futhermore, uclamp doesn't work especially well on
systems that violate the big.LITTLE assumption that only clamping << max
saves meaningful energy[2]. Non-Android users widely scorn sugov when
they become aware of it. Web forums are full of suggestions to switch to
perfgov, or to switch to "conservative" or disable turbo for those who
want efficiency.

That said, given how long the the PELT time constant is, a bpf scheduler
that wanted to override sugov could probably cooperate with a userspace
daemon to set min and max uclamps to the same value to control frequency
selection without too much overhead, as long as it doesn't mind the
81-100% hole.

[1] https://www.phoronix.com/review/schedutil-quirky-2023

[2] Does that still hold on high-end Android devices with one or two
hot-rodded prime cores?

Thanks,

--
Russell Haley