Re: [PATCH v2 08/10] sched/fair: Add newidle balance to pick_task_fair()

From: Peter Zijlstra

Date: Thu Jun 11 2026 - 07:33:52 EST



Aaron,

Sorry I failed to notice this email earlier.

On Wed, Jun 03, 2026 at 05:51:08PM +0800, Aaron Lu wrote:

> I applied below diff and the problem is gone:
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 5f48af700fd44..942a543af3e54 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9897,6 +9897,9 @@ static struct task_struct *pick_task_fair(struct rq *rq, struct rq_flags *rf)
> return p;
>
> idle:
> + if (sched_core_enabled(rq))
> + return NULL;
> +
> new_tasks = sched_balance_newidle(rq, rf);
> if (new_tasks < 0)
> return RETRY_TASK;
>

Right, this is the safe patch and restores pick_task_fair() to its
previous status (for core-sched).

Since people are hitting this problem, I'm going to merge it as below.
I've presumed your SoB, please let me know if that's a problem.

I think I'm going to try and move newidle into sched_class::balance /
balance_fair(), but I'll do that next cycle.

Thanks!

---
Subject: sched/fair: Fix newidle vs core-sched
From: "Aaron Lu" <ziqianlu@xxxxxxxxxxxxx>
Date: Wed, 3 Jun 2026 17:51:08 +0800

From: "Aaron Lu" <ziqianlu@xxxxxxxxxxxxx>

While testing Prateek's throttle series, I noticed a panic issue when
coresched is enabled and bisected to this patch.

I fed the panic log and this patch to an agent and its analysis looks
correct to me(cpu56 and cpu57 are siblings in a VM):

cpu57 (holds core-wide lock)

pick_next_task() [core scheduling]
for_each_cpu_wrap(i, smt_mask, 57):
i=57: pick_task(rq_57)
pick_task_fair(rq_57)
-> picks task A
rq_57->core_pick = task A
// task_rq(A) == rq_57

i=56: pick_task(rq_56)
pick_task_fair(rq_56)
cfs_rq->nr_queued == 0
goto idle
sched_balance_newidle(rq_56)
raw_spin_rq_unlock(rq_56)
// core-wide lock released
newidle_balance() pulls
task A: rq_57 -> rq_56
// task_rq(A) == rq_56 now
raw_spin_rq_lock(rq_56)
// core-wide lock re-acquired
return > 0
goto again
pick_task_fair(rq_56)
-> picks task A
rq_56->core_pick = task A

// first loop done
// rq_57->core_pick is still task A (set before lock release)
// but task_rq(A) == rq_56 now
next = rq_57->core_pick // = task A

put_prev_set_next_task(rq_57, prev, task A)
__set_next_task_fair(rq_57, task A)
hrtick_start_fair(rq_57, task A)
WARN_ON_ONCE(task_rq(task A) != rq_57)
// task_rq(A) == rq_56

IOW: by allowing pick_task_fair() to do newidle_balance and not returning
RETRY_TASK, it can end up selecting the same task on two CPUs. Restore the
previous state by never doing newidle when core scheduling is enabled.

Tested-by: Sven Schnelle <svens@xxxxxxxxxxxxx>
Signed-off-by: "Aaron Lu" <ziqianlu@xxxxxxxxxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Link: https://patch.msgid.link/20260603095108.GA1684319@xxxxxxxxxxxxx
---
kernel/sched/fair.c | 3 +++
1 file changed, 3 insertions(+)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9942,6 +9942,9 @@ struct task_struct *pick_task_fair(struc
return p;

idle:
+ if (sched_core_enabled(rq))
+ return NULL;
+
new_tasks = sched_balance_newidle(rq, rf);
if (new_tasks < 0)
return RETRY_TASK;