Re: [PATCH 03/15] sched/fair: Add lag based placement
From: Breno Leitao
Date: Fri Feb 07 2025 - 05:07:34 EST
Hello Peter,
On Wed, May 31, 2023 at 01:58:42PM +0200, Peter Zijlstra wrote:
>
> place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> {
<snip>
> - vruntime -= thresh;
> + lag *= load + se->load.weight;
> + if (WARN_ON_ONCE(!load))
I have 6.13 running on some hosts, and in some cases, where the system
is getting some OOMs, I see the following stack:
WARNING: CPU: 29 PID: 593474 at kernel/sched/fair.c:5250 place_entity+0x199/0x1b0
Call Trace:
<TASK>
? __warn+0xd1/0x1b0
? place_entity+0x199/0x1b0
? report_bug+0x140/0x1c0
? handle_bug+0x5e/0x90
? exc_invalid_op+0x16/0x40
? asm_exc_invalid_op+0x16/0x20
? place_entity+0x199/0x1b0
reweight_entity+0x188/0x200
enqueue_task_fair.llvm.15448040313737105663+0x28c/0x560
enqueue_task+0x30/0x120
ttwu_do_activate+0x99/0x230
try_to_wake_up+0x25a/0x4a0
? hrtimer_dummy_timeout+0x10/0x10
hrtimer_wakeup+0x25/0x30
__hrtimer_run_queues+0xf1/0x250
hrtimer_interrupt+0xfb/0x220
__sysvec_apic_timer_interrupt+0x47/0x140
sysvec_apic_timer_interrupt+0x35/0x80
asm_sysvec_apic_timer_interrupt+0x16/0x20
I am sorry for not decoding the stack, but I am having a hard time
decoding the stack properly. The values I got was misleading, and I am
working to understand what is happening.
Anyway, I don't have a reproducer and this problem doesn't happen
frequent enough. I have 1K hosts with 6.13 and I saw it 5 times in the
last week.
Also, this is happening in 6.13.1.
Thanks
--breno