Re: wakeup_affine_weight() is b0rked - was Re: [PATCH 2/2] sched/fair: Scale wakeup granularity relative to nr_running
From: Mike Galbraith
Date: Sun Oct 03 2021 - 10:53:23 EST
On Sun, 2021-10-03 at 20:34 +1300, Barry Song wrote:
> >
> > I looked into that crazy stacking depth...
> >
> > static int
> > wake_affine_weight(struct sched_domain *sd, struct task_struct *p,
> > int this_cpu, int prev_cpu, int sync)
> > {
> > s64 this_eff_load, prev_eff_load;
> > unsigned long task_load;
> >
> > this_eff_load = cpu_load(cpu_rq(this_cpu));
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^ the butler didit!
> >
> > That's pretty darn busted as it sits. Between load updates, X, or any
> > other waker of many, can stack wakees to a ludicrous depth. Tracing
> > kbuild vs firefox playing a youtube clip, I watched X stack 20 of the
> > zillion firefox minions while their previous CPUs all had 1 lousy task
> > running but a cpu_load() higher than the cpu_load() of X's CPU. Most
> > of those prev_cpus were where X had left them when it migrated. Each
> > and every crazy depth migration was wake_affine_weight() deciding we
> > should pull based on crappy data. As instantaneous load on the waker
> > CPU blew through the roof in my trace snapshot, its cpu_load() did
> > finally budge.. a tiny bit.. downward. No idea where the stack would
> > have topped out, my tracing_off() limit was 20.
>
> Mike, not quite sure I caught your point. It seems you mean x wakes up
> many firefoxes within a short period, so it pulls them to the CPU where x
> is running. Technically those pulling should increase cpu_load of x' CPU.
> But due to some reason, the cpu_load is not increased in time on x' CPU,
> So this makes a lot of firefoxes piled on x' CPU, but at that time, the load
> of the cpu which firefox was running on is still larger than x' cpu with a lot
> of firefoxes?
It looked like this.
X-2211 [007] d...211 2327.810997: select_task_rq_fair: this_run/load:4:373 prev_run/load:4:373 waking firefox:4971 CPU7 ==> CPU7
X-2211 [007] d...211 2327.811004: select_task_rq_fair: this_run/load:5:373 prev_run/load:1:1029 waking QXcbEventQueue:4952 CPU0 ==> CPU7
X-2211 [007] d...211 2327.811010: select_task_rq_fair: this_run/load:6:373 prev_run/load:1:1528 waking QXcbEventQueue:3969 CPU5 ==> CPU7
X-2211 [007] d...211 2327.811015: select_task_rq_fair: this_run/load:7:373 prev_run/load:1:1029 waking evolution-alarm:3833 CPU0 ==> CPU7
X-2211 [007] d...211 2327.811021: select_task_rq_fair: this_run/load:8:373 prev_run/load:1:1528 waking QXcbEventQueue:3860 CPU5 ==> CPU7
X-2211 [007] d...211 2327.811026: select_task_rq_fair: this_run/load:8:373 prev_run/load:1:1528 waking QXcbEventQueue:3800 CPU5 ==> CPU7
X-2211 [007] d...211 2327.811032: select_task_rq_fair: this_run/load:9:373 prev_run/load:1:1528 waking xdg-desktop-por:3341 CPU5 ==> CPU7
X-2211 [007] d...211 2327.811037: select_task_rq_fair: this_run/load:10:373 prev_run/load:1:289 waking at-spi2-registr:3165 CPU4 ==> CPU7
X-2211 [007] d...211 2327.811042: select_task_rq_fair: this_run/load:11:373 prev_run/load:1:1029 waking ibus-ui-gtk3:2865 CPU0 ==> CPU0
X-2211 [007] d...211 2327.811049: select_task_rq_fair: this_run/load:11:373 prev_run/load:1:226 waking ibus-x11:2868 CPU2 ==> CPU2
X-2211 [007] d...211 2327.811054: select_task_rq_fair: this_run/load:11:373 prev_run/load:11:373 waking ibus-extension-:2866 CPU7 ==> CPU7
X-2211 [007] d...211 2327.811059: select_task_rq_fair: this_run/load:12:373 prev_run/load:1:289 waking QXcbEventQueue:2804 CPU4 ==> CPU7
X-2211 [007] d...211 2327.811063: select_task_rq_fair: this_run/load:13:373 prev_run/load:1:935 waking QXcbEventQueue:2756 CPU1 ==> CPU7
X-2211 [007] d...211 2327.811068: select_task_rq_fair: this_run/load:14:373 prev_run/load:1:1528 waking QXcbEventQueue:2753 CPU5 ==> CPU7
X-2211 [007] d...211 2327.811074: select_task_rq_fair: this_run/load:15:373 prev_run/load:1:1528 waking QXcbEventQueue:2741 CPU5 ==> CPU7
X-2211 [007] d...211 2327.811079: select_task_rq_fair: this_run/load:16:373 prev_run/load:1:1528 waking QXcbEventQueue:2730 CPU5 ==> CPU7
X-2211 [007] d...211 2327.811085: select_task_rq_fair: this_run/load:17:373 prev_run/load:1:5 waking QXcbEventQueue:2724 CPU0 ==> CPU0
X-2211 [007] d...211 2327.811090: select_task_rq_fair: this_run/load:17:373 prev_run/load:1:1010 waking QXcbEventQueue:2721 CPU6 ==> CPU7
X-2211 [007] d...211 2327.811096: select_task_rq_fair: this_run/load:18:373 prev_run/load:1:1528 waking QXcbEventQueue:2720 CPU5 ==> CPU7
X-2211 [007] d...211 2327.811101: select_task_rq_fair: this_run/load:19:373 prev_run/load:1:1528 waking QXcbEventQueue:2704 CPU5 ==> CPU7
X-2211 [007] d...211 2327.811105: select_task_rq_fair: this_run/load:20:373 prev_run/load:0:226 waking QXcbEventQueue:2705 CPU2 ==> CPU2
X-2211 [007] d...211 2327.811110: select_task_rq_fair: this_run/load:19:342 prev_run/load:1:1528 waking QXcbEventQueue:2695 CPU5 ==> CPU7
X-2211 [007] d...211 2327.811115: select_task_rq_fair: this_run/load:20:342 prev_run/load:1:1528 waking QXcbEventQueue:2694 CPU5 ==> CPU7
X-2211 [007] d...211 2327.811120: select_task_rq_fair: this_run/load:21:342 prev_run/load:1:1528 waking QXcbEventQueue:2679 CPU5 ==> CPU7
Legend: foo_run/load:foo->nr_running:cpu_load(foo)
Every migration to CPU7 in the above was due to wake_affine_weight()
seeing more or less static effective load numbers (the trace was wider,
showing which path was taken).
> I am wondering if this should be the responsibility of wake_wide()?
That's a good point. I'm not so sure that would absolve use of what
appears to be stagnant state though. If we hadn't gotten there, this
stack obviously wouldn't have happened.. but we did get there, and
state that was used did not reflect reality. wake_wide() deflecting
this particular gaggle wouldn't improved state accuracy one whit for a
subsequent wakeup, or?
-Mike