Re: [PATCH v3 4/5] sched/pelt: Add a new runnable average signal

From: Vincent Guittot
Date: Fri Feb 21 2020 - 03:56:32 EST

Next message: Ard Biesheuvel: "[GIT PULL] EFI updates for v5.7"
Previous message: tiantao (H): "Re: [PATCH] drm/hisilicon: Fixed pcie resource conflict using the general API"
In reply to: Valentin Schneider: "Re: [PATCH v3 4/5] sched/pelt: Add a new runnable average signal"
Next in thread: Mel Gorman: "Re: [PATCH v3 4/5] sched/pelt: Add a new runnable average signal"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, 20 Feb 2020 at 17:11, Valentin Schneider
<valentin.schneider@xxxxxxx> wrote:
>
> On 20/02/2020 14:36, Vincent Guittot wrote:
> > I agree that setting by default to SCHED_CAPACITY_SCALE is too much
> > for little core.
> > The problem for little core can be fixed by using the cpu capacity instead
> >
>
> So that's indeed better for big.LITTLE & co. Any reason however for not
> aligning with the initialization of util_avg ?

The runnable_avg is the unweighted version of the load_avg so they
should both be sync at init and SCHED_CAPACITY_SCALE is in fact the
right value. Using cpu_scale is the same for smp and big core so we
can use it instead.

Then, the initial value of util_avg has never reflected some kind of
realistic value for the utilization of a new task, especially if those
tasks will become big ones. Runnable_avg now balances this effect to
say that we don't know what will be the behavior of the new task,
which might end up using all spare capacity although current
utilization is low and CPU is not "fully used". In fact, this is
exactly the purpose of runnable: highlight that there is maybe no
spare capacity even if CPU's utilization is low because of external
event like task migration or having new tasks with most probably wrong
utilization.

That being said, there is a bigger problem with the current version of
this patch, which is that I forgot to use runnable in
update_sg_wakeup_stats(). I have a patch that fixes this problem.

Also, I have tested both proposals with hackbench on my octo cores and
using cpu_scale gives slightly better results than util_avg, which
probably reflects the case I mentioned above.

grp cpu_scale util_avg improvement
1 1,191(+/-0.77 %) 1,204(+/-1.16 %) -1.07 %
4 1,147(+/-1.14 %) 1,195(+/-0.52 %) -4.21 %
8 1,112(+/-1,52 %) 1,124(+/-1,45 %) -1.12 %
16 1,163(+/-1.72 %) 1,169(+/-1.58 %) -0,45 %

>
> With the default MC imbalance_pct (117), it takes 875 utilization to make
> a single CPU group (with 1024 capacity) overloaded (group_is_overloaded()).
> For a completely idle CPU, that means forking at least 3 tasks (512 + 256 +
> 128 util_avg)
>
> With your change, it only takes 2 tasks. I know I'm being nitpicky here, but
> I feel like those should be aligned, unless we have a proper argument against

I don't see any reason for keeping them aligned

> it - in which case this should also appear in the changelog with so far only
> mentions issues with util_avg migration, not the fork time initialization.
>
> > @@ -796,6 +794,8 @@ void post_init_entity_util_avg(struct task_struct *p)
> > }
> > }
> >
> > + sa->runnable_avg = cpu_scale;
> > +
> > if (p->sched_class != &fair_sched_class) {
> > /*
> > * For !fair tasks do:
> >>
> >>> sa->load_avg = scale_load_down(se->load.weight);
> >>> + }
> >>>
> >>> /* when this task enqueue'ed, it will contribute to its cfs_rq's load_avg */
> >>> }

Next message: Ard Biesheuvel: "[GIT PULL] EFI updates for v5.7"
Previous message: tiantao (H): "Re: [PATCH] drm/hisilicon: Fixed pcie resource conflict using the general API"
In reply to: Valentin Schneider: "Re: [PATCH v3 4/5] sched/pelt: Add a new runnable average signal"
Next in thread: Mel Gorman: "Re: [PATCH v3 4/5] sched/pelt: Add a new runnable average signal"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]