[tip: sched/core] sched/fair: Fix cpu_util runnable_avg arithmetic

From: tip-bot2 for Hongyan Xia

Date: Tue Jun 09 2026 - 04:41:24 EST


The following commit has been merged into the sched/core branch of tip:

Commit-ID: 29922fdfc2a4008d66418bedd0ebf5038fc54efa
Gitweb: https://git.kernel.org/tip/29922fdfc2a4008d66418bedd0ebf5038fc54efa
Author: Hongyan Xia <hongyan.xia@xxxxxxxxxxxxx>
AuthorDate: Fri, 05 Jun 2026 09:43:39
Committer: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
CommitterDate: Tue, 09 Jun 2026 10:28:08 +02:00

sched/fair: Fix cpu_util runnable_avg arithmetic

If we take runnable_avg in max(runnable_avg, util_avg) in cpu_util(), we
should then add or subtract task runnable_avg, but the arithmetic below
is still with task util_avg. This mixes runnable_avg with util_avg which
is incorrect.

Fix by always doing arithmetic with runnable_avg and only take
max(runnable_avg, util_avg) at the last step.

Fixes: 7d0583cf9ec7 ("sched/fair, cpufreq: Introduce 'runnable boosting'")
Signed-off-by: Hongyan Xia <hongyan.xia@xxxxxxxxxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Reviewed-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
Link: https://patch.msgid.link/20260605094318.37931-1-hongyan.xia@xxxxxxxxxxxxx
---
kernel/sched/fair.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f4ed841..1b23e73 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8968,25 +8968,32 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
static unsigned long
cpu_util(int cpu, struct task_struct *p, int dst_cpu, int boost)
{
+ bool add_task = p && task_cpu(p) != cpu && dst_cpu == cpu;
+ bool sub_task = p && task_cpu(p) == cpu && dst_cpu != cpu;
struct cfs_rq *cfs_rq = &cpu_rq(cpu)->cfs;
unsigned long util = READ_ONCE(cfs_rq->avg.util_avg);
unsigned long runnable;

- if (boost) {
- runnable = READ_ONCE(cfs_rq->avg.runnable_avg);
- util = max(util, runnable);
- }
-
/*
* If @dst_cpu is -1 or @p migrates from @cpu to @dst_cpu remove its
* contribution. If @p migrates from another CPU to @cpu add its
* contribution. In all the other cases @cpu is not impacted by the
* migration so its util_avg is already correct.
*/
- if (p && task_cpu(p) == cpu && dst_cpu != cpu)
- lsub_positive(&util, task_util(p));
- else if (p && task_cpu(p) != cpu && dst_cpu == cpu)
+ if (add_task)
util += task_util(p);
+ else if (sub_task)
+ lsub_positive(&util, task_util(p));
+
+ if (boost) {
+ runnable = READ_ONCE(cfs_rq->avg.runnable_avg);
+ if (add_task)
+ runnable += READ_ONCE(p->se.avg.runnable_avg);
+ else if (sub_task)
+ lsub_positive(&runnable,
+ READ_ONCE(p->se.avg.runnable_avg));
+ util = max(util, runnable);
+ }

if (sched_feat(UTIL_EST)) {
unsigned long util_est;