Re: [REGRESSION] Re: [PATCH 00/24] Complete EEVDF
From: Peter Zijlstra
Date: Wed Jan 08 2025 - 08:12:16 EST
I failed to realize the follow up email was private, so duplicating that
here again, but also new content :-)
On Tue, Jan 07, 2025 at 09:15:59PM -0800, Doug Smythies wrote:
> On 2025.07.11:24 Peter Zijlstra wrote:
> > What exact cgroup config are you having? /sys/kernel/debug/sched/debug
> > should be able to tell you.
>
> I do not know.
> I'll capture the above output, compress it, and send it to you.
>
> I did also boot with systemd.unified_cgroup_hierarchy=0
> and it made no difference.
I think you need: "cgroup_disable=cpu noautogroup" to fully disable all
the cpu-cgroup muck. Anyway:
$ zcat cgroup2.txt.gz | grep -e yes -e turbo | awk '{print $2 "\t" $16}'
yes /user.slice/user-1000.slice/session-1.scope
yes /user.slice/user-1000.slice/session-1.scope
yes /user.slice/user-1000.slice/session-1.scope
yes /user.slice/user-1000.slice/session-1.scope
turbostat /autogroup-286
yes /user.slice/user-1000.slice/session-1.scope
yes /user.slice/user-1000.slice/session-1.scope
yes /user.slice/user-1000.slice/session-1.scope
yes /user.slice/user-1000.slice/session-1.scope
yes /user.slice/user-1000.slice/session-1.scope
yes /user.slice/user-1000.slice/session-1.scope
yes /user.slice/user-1000.slice/session-1.scope
yes /user.slice/user-1000.slice/session-1.scope
turbostat /autogroup-286
That matches the scenario where I could reproduce, two competing groups.
I'm seeing wild vruntime divergence when this happens -- this is
definitely wonky. Basically the turbostat groups gets starved for a
while while the yes group catches up.
It looks like reweight_entity() is shooting out the cgroup entity to the
right.
So it builds up some negative lag (received surplus service) and then
because turbostat goes sleep for a second, it's cgroup's share gets
truncated to 2 and it shoots the cgroup entity out waaaaaaaay far.
Thing is, waking up *should* fix that up again, but that doesn't appear
to happen, leaving us up a creek.
/me noodles a bit....
Does this help?
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c0e58e51801f..daa62cfa3092 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7000,6 +7063,13 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
if (flags & ENQUEUE_DELAYED) {
requeue_delayed_entity(se);
+ se = se->parent;
+ for_each_sched_entity(se) {
+ cfs_rq = cfs_rq_of(se);
+ update_load_avg(cfs_rq, se, UPDATE_TG);
+ se_update_runnable(se);
+ update_cfs_group(se);
+ }
return;
}