Re: [v4.8-rc1 Regression] sched/fair: Apply more PELT fixes

From: Peter Zijlstra
Date: Tue Oct 18 2016 - 05:08:10 EST


On Mon, Oct 17, 2016 at 11:52:39PM +0100, Dietmar Eggemann wrote:
>
> Something looks weird related to the use of for_each_possible_cpu(i) in
> online_fair_sched_group() on my i5-3320M CPU (4 logical cpus).
>
> In case I print out cpu id and the cpu masks inside the for_each_possible_cpu(i)
> I get:
>
> [ 5.462368] cpu=0 cpu_possible_mask=0-7 cpu_online_mask=0-3 cpu_present_mask=0-3 cpu_active_mask=0-3

OK, you have a buggy BIOS :-) It enumerates too many CPU slots. There is
no reason to have 4 empty CPU slots on a machine that cannot do physical
hotplug.

This also explains why it doesn't show on many machines, most machines
will not have this and possible_mask == present_mask == online_mask for
most all cases.

x86 folk, can we detect the lack of physical hotplug capability and FW_WARN
on this and lowering possible_mask to present_mask?

> [ 5.462370] cpu=1 cpu_possible_mask=0-7 cpu_online_mask=0-3 cpu_present_mask=0-3 cpu_active_mask=0-3
> [ 5.462370] cpu=2 cpu_possible_mask=0-7 cpu_online_mask=0-3 cpu_present_mask=0-3 cpu_active_mask=0-3
> [ 5.462371] cpu=3 cpu_possible_mask=0-7 cpu_online_mask=0-3 cpu_present_mask=0-3 cpu_active_mask=0-3
> [ 5.462372] *cpu=4* cpu_possible_mask=0-7 cpu_online_mask=0-3 cpu_present_mask=0-3 cpu_active_mask=0-3
> [ 5.462373] *cpu=5* cpu_possible_mask=0-7 cpu_online_mask=0-3 cpu_present_mask=0-3 cpu_active_mask=0-3
> [ 5.462374] *cpu=6* cpu_possible_mask=0-7 cpu_online_mask=0-3 cpu_present_mask=0-3 cpu_active_mask=0-3
> [ 5.462375] *cpu=7* cpu_possible_mask=0-7 cpu_online_mask=0-3 cpu_present_mask=0-3 cpu_active_mask=0-3
>
> T430:/sys/fs/cgroup/cpu,cpuacct/system.slice# ls -l | grep '^d' | wc -l
> 80
>
> /proc/sched_debug:
>
> cfs_rq[0]:/system.slice
> ...
> .tg_load_avg : 323584
> ...
>
> 80 * 1024 * 4 (not existent cpu4-cpu7) = 327680 (with a little bit of decay,
> this could be this extra load on the systen.slice tg)
>
> Using for_each_online_cpu(i) instead of for_each_possible_cpu(i) in
> online_fair_sched_group() works on this machine, i.e. the .tg_load_avg
> of system.slice tg is 0 after startup.

Right, so the reason for using present_mask is that it avoids having to
deal with hotplug, also all the per-cpu memory is allocated and present
for !online CPUs anyway, so might as well set it up properly anyway.

(You might want to start booting your laptop with "possible_cpus=4" to
save some memory FWIW.)

But yes, we have a bug here too... /me ponders

So aside from funny BIOSes, this should also show up when creating
cgroups when you have offlined a few CPUs, which is far more common I'd
think.

On IRC you mentioned that adding list_add_leaf_cfs_rq() to
online_fair_sched_group() cures this, this would actually match with
unregister_fair_sched_group() doing list_del_leaf_cfs_rq() and avoid
a few instructions on the enqueue path, so that's all good.

I'm just not immediately seeing how that cures things. The only relevant
user of the leaf_cfs_rq list seems to be update_blocked_averages() which
is called from the balance code (idle_balance() and
rebalance_domains()). But neither should call that for offline (or
!present) CPUs.


Humm..