Re: [PATCH 1/1] sched/fair: Ignore OU for lone task on max-cap CPU
From: Pierre Gondois
Date: Fri Jan 09 2026 - 12:13:39 EST
Hello Christian,
On 12/30/25 10:30, Christian Loehle wrote:
Tasks that have an utilization high enough to trigger misfit or
NIT: an -> a
also I don't think it's possible to be misfit on a big CPU
overutilized on a max-cap CPU don't have any better CPU to be placed
on, as long as this CPU isn't under significant thermal or system
pressure. There's no reason to let it trigger the global
overutilized state then.
Treat maximum capacity CPUs with just a single task as !overutilized
to let EAS decide placements on the remaining tasks and CPUs, it will
already avoid placing additional tasks on these CPUs as they don't have
any spare capacity.
Overutilized state is global to 1) ensure maximum throughput and 2)
prevent running find_energy_efficient_cpu() with unreliable PELT values
when compute capacity isn't provided to tasks.
1) remains trivially true as for CAS the same 1024-capacity CPU would
have been a correct choice for a lone task, too.
2) is guaranteed by limiting it to nr_running <= 1, the task itself
then has accurate PELT values as maximum compute capacity can be provided
(also ensured by subtracting system and thermal pressure from the CPU).
EAS will naturally not place additional tasks on the CPU as
find_energy_efficient_cpu() requires the task's utilization to fit onto
the spare-cap (util_fits_cpu()), of which there is none in the scenario
we are concerned with.
Signed-off-by: Christian Loehle <christian.loehle@xxxxxxx>
---
kernel/sched/fair.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index da46c3164537..d885b2a0fcd3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6790,6 +6790,12 @@ static inline bool cpu_overutilized(int cpu)
if (!sched_energy_enabled())
return false;
+ /* Single task on max-cap CPU isn't misfit so no reason to trigger OU */
+ if (arch_scale_cpu_capacity(cpu) == SCHED_CAPACITY_SCALE &&
+ cpu_rq(cpu)->nr_running <= 1 &&
+ !capacity_greater(SCHED_CAPACITY_SCALE, capacity_of(cpu)))
+ return false;
+
(Just to discuss)
1.
capacity_of() takes into account the cpufreq pressure through
get_actual_cpu_capacity(). This means that on a platform with boost
frequencies (or where the max freq. is lower than the max),
the overutilized state will be triggered.
So enabling boosting will actually lead to better energy placement.
IMO this is the right thing to do as there might be multiple
clusters of big CPUs and some of them might be capped while other not.
Sorting the different cases seems complicated, so your solution might
be the simplest/best.
2.
Tasks have a p->max_allowed_capacity property. But similarly as for 1.,
the cpufreq pressure is not taken into account, so it is not possible
to use it.
3.
UCLAMP_MAX tasks already don't trigger the OU state.
UCLAMP_MIN tasks don't trigger it (if we are only looking at the
UCLAMP_MIN property, not the actual task utilization).
So the UCLAMP_* cases should be ok aswell.
------
So in the end I think it would be nice to have your patch.