Re: [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled

From: K Prateek Nayak
Date: Fri Nov 29 2024 - 12:47:11 EST


Hello Peter,

On 11/29/2024 3:25 PM, Peter Zijlstra wrote:
On Thu, Nov 28, 2024 at 12:59:54PM +0530, K Prateek Nayak wrote:

I tried to put a SCHED_WARN_ON() here to track where this comes from and
seems like it is usually from attach_task() in the load balancing path
pulling a delayed task which is set as the next buddy in
check_preempt_wakeup_fair()

Can you please try the following diff instead of the first two patches
and see if you still hit these warnings, stalls, and pick_eevdf()
returning NULL?

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ff7cae9274c5..61e74eb5af22 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5478,6 +5478,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
bool sleep = flags & DEQUEUE_SLEEP;
update_curr(cfs_rq);
+ clear_buddies(cfs_rq, se);
if (flags & DEQUEUE_DELAYED) {
SCHED_WARN_ON(!se->sched_delayed);
@@ -5520,8 +5521,6 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
update_stats_dequeue_fair(cfs_rq, se, flags);
- clear_buddies(cfs_rq, se);
-
update_entity_lag(cfs_rq, se);
if (sched_feat(PLACE_REL_DEADLINE) && !sleep) {
se->deadline -= se->vruntime;

So this puts the clear_buddies() before the whole delayed thing, and
should be sufficient afaict, no?

@@ -8767,7 +8766,7 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
if (unlikely(throttled_hierarchy(cfs_rq_of(pse))))
return;
- if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK)) {
+ if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK) && !pse->sched_delayed) {
set_next_buddy(pse);
}

But then this should never happen, which is after a wakeup, p and the
whole hierarchy up should be runnable at this point.

Or should I go find more wake-up juice and try again :-)

The motivation there was this splat from my testing:

Kernel panic - not syncing: Encountered delayed entity in check_preempt_wakeup_fair()
CPU: 190 UID: 1000 PID: 4215 Comm: sched-messaging Tainted: G W 6.12.0-rc4-test+ #156
Tainted: [W]=WARN
Hardware name: Dell Inc. PowerEdge R6525/024PW1, BIOS 2.7.3 03/30/2022
Call Trace:
<TASK>
panic+0x399/0x3f0
check_preempt_wakeup_fair+0x21b/0x220
wakeup_preempt+0x64/0x70
sched_balance_rq+0x970/0x12e0
? update_load_avg+0x7e/0x7e0
sched_balance_newidle+0x1e2/0x490
pick_next_task_fair+0x32/0x3b0
__pick_next_task+0x3d/0x1a0
__schedule+0x1da/0x1540
? srso_alias_return_thunk+0x5/0xfbef5
? aa_file_perm+0x121/0x4d0
schedule+0x28/0x110
pipe_read+0x345/0x470
? __pfx_autoremove_wake_function+0x10/0x10
vfs_read+0x2f1/0x330
ksys_read+0xaf/0xe0
do_syscall_64+0x6f/0x110
? srso_alias_return_thunk+0x5/0xfbef5
? current_time+0x31/0xf0
? srso_alias_return_thunk+0x5/0xfbef5
? atime_needs_update+0x9c/0x110
? srso_alias_return_thunk+0x5/0xfbef5
? touch_atime+0x1e/0x100
? srso_alias_return_thunk+0x5/0xfbef5
? pipe_read+0x3ec/0x470
? srso_alias_return_thunk+0x5/0xfbef5
? vfs_read+0x2f1/0x330
? srso_alias_return_thunk+0x5/0xfbef5
? ksys_read+0xcc/0xe0
? srso_alias_return_thunk+0x5/0xfbef5
? syscall_exit_to_user_mode+0x51/0x1a0
? srso_alias_return_thunk+0x5/0xfbef5
? do_syscall_64+0x7b/0x110
? ksys_read+0xcc/0xe0
? srso_alias_return_thunk+0x5/0xfbef5
? syscall_exit_to_user_mode+0x51/0x1a0
? srso_alias_return_thunk+0x5/0xfbef5
? do_syscall_64+0x7b/0x110
? srso_alias_return_thunk+0x5/0xfbef5
? ksys_read+0xcc/0xe0
? do_syscall_64+0x7b/0x110
? srso_alias_return_thunk+0x5/0xfbef5
? syscall_exit_to_user_mode+0x51/0x1a0
? srso_alias_return_thunk+0x5/0xfbef5
? do_syscall_64+0x7b/0x110
? srso_alias_return_thunk+0x5/0xfbef5
? do_syscall_64+0x7b/0x110
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fa8b851481c
Code: ec 28 48 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 e9 c1 f7 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 34 44 89 c7 48 89 44 24 08 e8 2f c2 f7 ff 48
RSP: 002b:00007fa8ae0d8cb0 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa8b851481c
RDX: 0000000000000064 RSI: 00007fa8ae0d8cf0 RDI: 0000000000000027
RBP: 00007fa8ae0d8d90 R08: 0000000000000000 R09: 00007ffee0fdf97f
R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000064
R13: 00007fa8ae0d8cf0 R14: 00000000000005aa R15: 000055c2e80accb0
</TASK>
--

newidle balance pulls a delayed entity which goes through the
check_preempt_wakeup_fair() path in attach_task() and is set as the next
buddy. On a second thought this is perhaps not required since even if
this delayed entity is picked, it'll go thorough a full dequeue and the
clear_buddies() change above should take care of it.




Anyway.. I'm sure I started a patch series cleaning up the whole next
buddy thing months ago (there's more problems here), but I can't seem to
find it in a hurry :/




--
Thanks and Regards,
Prateek