Re: [RFC PATCH v3 00/16] Core scheduling v3

From: Aaron Lu
Date: Thu Aug 08 2019 - 08:55:48 EST


On Mon, Aug 05, 2019 at 08:55:28AM -0700, Tim Chen wrote:
> On 8/2/19 8:37 AM, Julien Desfossez wrote:
> > We tested both Aaron's and Tim's patches and here are our results.
> >
> > Test setup:
> > - 2 1-thread sysbench, one running the cpu benchmark, the other one the
> > mem benchmark
> > - both started at the same time
> > - both are pinned on the same core (2 hardware threads)
> > - 10 30-seconds runs
> > - test script: https://paste.debian.net/plainh/834cf45c
> > - only showing the CPU events/sec (higher is better)
> > - tested 4 tag configurations:
> > - no tag
> > - sysbench mem untagged, sysbench cpu tagged
> > - sysbench mem tagged, sysbench cpu untagged
> > - both tagged with a different tag
> > - "Alone" is the sysbench CPU running alone on the core, no tag
> > - "nosmt" is both sysbench pinned on the same hardware thread, no tag
> > - "Tim's full patchset + sched" is an experiment with Tim's patchset
> > combined with Aaron's "hack patch" to get rid of the remaining deep
> > idle cases
> > - In all test cases, both tasks can run simultaneously (which was not
> > the case without those patches), but the standard deviation is a
> > pretty good indicator of the fairness/consistency.
>
> Thanks for testing the patches and giving such detailed data.
>
> I came to realize that for my scheme, the accumulated deficit of forced idle could be wiped
> out in one execution of a task on the forced idle cpu, with the update of the min_vruntime,
> even if the execution time could be far less than the accumulated deficit.
> That's probably one reason my scheme didn't achieve fairness.

Turns out there is a typo error in v3 when setting rq's core_forceidle:

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 26fea68f7f54..542974a8da18 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3888,7 +3888,7 @@ next_class:;
WARN_ON_ONCE(!rq_i->core_pick);

if (is_idle_task(rq_i->core_pick) && rq_i->nr_running)
- rq->core_forceidle = true;
+ rq_i->core_forceidle = true;

rq_i->core_pick->core_occupation = occ;

With this fixed and together with the patch to let schedule always
happen, your latest 2 patches work well for the 10s cpuhog test I
described previously:
https://lore.kernel.org/lkml/20190725143003.GA992@aaronlu/

overloaded workload without any cpu binding doesn't work well though, I
haven't taken a closer look yet.