Re: [PATCH 0/1] sched/fair: Fix unfairness caused by missing load decay
From: Odin Ugedal
Date: Mon Apr 26 2021 - 12:33:55 EST
Hi,
> Have you been able to reproduce this on mainline ?
Yes. I have been debugging and testing with v5.12-rc8. After I found
the suspected
commit in ~v4.8, I compiled both the v4.4.267 and v4.9.267, and was able to
successfully reproduce it on v4.9.267 and not on v4.4.267. It is also
reproducible
on 5.11.16-arch1-1 that my distro ships, and it is reproducible on all
the machines
I have tested.
> When running the script below on v5.12, I'm not able to reproduce your problem
v5.12 is pretty fresh, so I have not tested on anything before v5.12-rc8. I did
compile v5.12.0 now, and I am able to reproduce it there as well.
Which version did you try (the one for cgroup v1 or v2)? And/or did you try
to run the inspection bpftrace script? If you tested the cg v1
version, it will often
end up at 50/50, 51/49 etc., and sometimes 60/40+-, making it hard to
verify without inspection.
I have attached a version of the "sub cgroup" example for cgroup v1,
that also force
the process to start on cpu 1 (CPU_ME), and sends it over to cpu 0
(CPU) after attaching
to the new cgroup. That will make it evident each time. This example should also
always end up with 50/50 per stress process, but "always" ends up more
like 99/1.
Can you confirm if you are able to reproduce with this version?
--- bash start
CGROUP_CPU=/sys/fs/cgroup/cpu/slice
CGROUP_CPUSET=/sys/fs/cgroup/cpuset/slice
CGROUP_CPUSET_ME=/sys/fs/cgroup/cpuset/me
CPU=0
CPU_ME=1
function run_sandbox {
local CG_CPUSET="$1"
local CG_CPU="$2"
local INNER_SHARES="$3"
local CMD="$4"
local PIPE="$(mktemp -u)"
mkfifo "$PIPE"
sh -c "read < $PIPE ; exec $CMD" &
local TASK="$!"
sleep .1
mkdir -p "$CG_CPUSET"
mkdir -p "$CG_CPU"/sub
tee "$CG_CPU"/sub/cgroup.procs <<< "$TASK"
tee "$CG_CPU"/sub/cpu.shares <<< "$INNER_SHARES"
tee "$CG_CPUSET"/cgroup.procs <<< "$TASK"
tee "$PIPE" <<< sandox_done
rm "$PIPE"
}
mkdir -p "$CGROUP_CPU"
mkdir -p "$CGROUP_CPUSET"
mkdir -p "$CGROUP_CPUSET_ME"
tee "$CGROUP_CPUSET"/cpuset.cpus <<< "$CPU"
tee "$CGROUP_CPUSET"/cpuset.mems <<< "$CPU"
tee "$CGROUP_CPUSET_ME"/cpuset.cpus <<< "$CPU_ME"
echo $$ | tee "$CGROUP_CPUSET_ME"/cgroup.procs
run_sandbox "$CGROUP_CPUSET" "$CGROUP_CPU/cg-1" 50000 "stress --cpu 1"
run_sandbox "$CGROUP_CPUSET" "$CGROUP_CPU/cg-2" 2 "stress --cpu 1"
read # click enter to cleanup and stop all stress procs
killall stress
sleep .2
rmdir /sys/fs/cgroup/cpuset/slice/
rmdir /sys/fs/cgroup/cpu/slice/{cg-{1,2}{/sub,},}
--- bash end
Thanks
Odin