Re: [RFC PATCH v3 00/16] Core scheduling v3

From: Julien Desfossez
Date: Fri May 31 2019 - 17:12:08 EST


> My first reaction is: when shell wakes up from sleep, it will
> fork date. If the script is untagged and those workloads are
> tagged and all available cores are already running workload
> threads, the forked date can lose to the running workload
> threads due to __prio_less() can't properly do vruntime comparison
> for tasks on different CPUs. So those idle siblings can't run
> date and are idled instead. See my previous post on this:
>
> https://lore.kernel.org/lkml/20190429033620.GA128241@aaronlu/
> (Now that I re-read my post, I see that I didn't make it clear
> that se_bash and se_hog are assigned different tags(e.g. hog is
> tagged and bash is untagged).
>
> Siblings being forced idle is expected due to the nature of core
> scheduling, but when two tasks belonging to two siblings are
> fighting for schedule, we should let the higher priority one win.
>
> It used to work on v2 is probably due to we mistakenly
> allow different tagged tasks to schedule on the same core at
> the same time, but that is fixed in v3.

I confirm this is indeed what is happening, we reproduced it with a
simple script that only uses one core (cpu 2 and 38 are sibling on this
machine):

setup:
cgcreate -g cpu,cpuset:test
cgcreate -g cpu,cpuset:test/set1
cgcreate -g cpu,cpuset:test/set2
echo 2,38 > /sys/fs/cgroup/cpuset/test/cpuset.cpus
echo 0 > /sys/fs/cgroup/cpuset/test/cpuset.mems
echo 2,38 > /sys/fs/cgroup/cpuset/test/set1/cpuset.cpus
echo 2,38 > /sys/fs/cgroup/cpuset/test/set2/cpuset.cpus
echo 0 > /sys/fs/cgroup/cpuset/test/set1/cpuset.mems
echo 0 > /sys/fs/cgroup/cpuset/test/set2/cpuset.mems
echo 1 > /sys/fs/cgroup/cpu,cpuacct/test/set1/cpu.tag

In one terminal:
sudo cgexec -g cpu,cpuset:test/set1 sysbench --threads=1 --time=30
--test=cpu run

In another one:
sudo cgexec -g cpu,cpuset:test/set2 date

It's very clear that 'date' hangs until sysbench is done.

We started experimenting with marking a task on the forced idle sibling
if normalized vruntimes are equal. That way, at the next compare, if the
normalized vruntimes are still equal, it prefers the task on the forced
idle sibling. It still needs more work, but in our early tests it helps.

Thanks,

Julien