[PATCH 0/3] sched: Generalize misfit load balance

From: Qais Yousef
Date: Fri Dec 08 2023 - 20:18:35 EST


Misfit load balance was added to help handle HMP systems where we can make
a wrong decision at wake up thinking a task can run at a smaller core, but its
characteristics change and requires to migrate to a bigger core to meet its
performance demands.

With the addition of uclamp, we can encounter more cases where such wrong
placement decisions can be made and require load balancer to do a corrective
action.

Specifically if a big task capped by uclamp_max was placed on a big core at
wake up because EAS thought it is the most energy efficient core at the time,
the dynamics of the system might change where other uncapped tasks might wake
up on the cluster and there could be a better new more energy efficient
placement for the capped task(s).

We can generalize the misfit load balance to handle different type of misfits
(whatever they may be) by simply giving it a reason. The reason can decide the
type of action required then.

Current misfit implementation is considered MISFIT_PERF. Which means we need to
move a task to a better CPU to meet its performance requirement.

For UCLAMP_MAX I propose MISFIT_POWER, where we need to find a better placement
to control its impact on power.

Once we have an API to annotate latency sensitive tasks, it is anticipated
MISFIT_LATENCY load balance will be required to help handle oversubscribe
situations to help better distribute the latency sensitive tasks to help reduce
their wake up latency.

Patch 1 splits misfit status update from misfit detection by adding a new
function is_misfit_task().

Patch 2 implements the generalization logic by adding a misfit reason and
propagating that correctly and guarding the current misfit code with
MISFIT_PERF reason.

Patch 3 is an RFC on a potential implementation for MISFIT_POWER.

Patch 1 and 2 were tested stand alone and had no regression observed and should
not introduce a functional change and can be considered for merge if they make
sense after addressing any review comments.

Patch 3 was only tested to verify it does what I expected it to do. But no real
power/perf testing was done. Mainly because I was expecting to remove uclamp
max-aggregation [1] and the RFC I currently have (which I wrote many many
months ago) is tied to detecting a task being uncapped by max-aggregation.
I need to rethink the detection mechanism.

Beside that, the logic relies on using find_energy_efficient_cpu() to find the
best potential new placement for the task. To do that though, we need to force
every CPU to do the MISFIT_POWER load balance as we don't know which CPU should
do the pull. But there might be better thoughts on how to handle this. So
feedback and thoughts would be appreciated.

[1] https://lore.kernel.org/lkml/20231208015242.385103-1-qyousef@xxxxxxxxxxx/

Thanks!

--
Qais Yousef

Qais Yousef (3):
sched/fair: Add is_misfit_task() function
sched/fair: Generalize misfit lb by adding a misfit reason
sched/fair: Implement new type of misfit MISFIT_POWER

kernel/sched/fair.c | 115 +++++++++++++++++++++++++++++++++++++------
kernel/sched/sched.h | 9 ++++
2 files changed, 110 insertions(+), 14 deletions(-)

--
2.34.1