Re: [PATCH v8 00/10] sched: consolidation of CPU capacity and usage

From: Wanpeng Li
Date: Mon Nov 03 2014 - 08:04:17 EST



On 14/11/3 äå6:55, Vincent Guittot wrote:
On 3 November 2014 03:12, Wanpeng Li <kernellwp@xxxxxxxxx> wrote:
Hi Vincent,
On 14/10/31 äå4:47, Vincent Guittot wrote:
This patchset consolidates several changes in the capacity and the usage
tracking of the CPU. It provides a frequency invariant metric of the usage
of
CPUs and generally improves the accuracy of load/usage tracking in the
scheduler. The frequency invariant metric is the foundation required for
the
consolidation of cpufreq and implementation of a fully invariant load
tracking.
These are currently WIP and require several changes to the load balancer
(including how it will use and interprets load and capacity metrics) and
extensive validation. The frequency invariance is done with
arch_scale_freq_capacity and this patchset doesn't provide the backends of
the function which are architecture dependent.

As discussed at LPC14, Morten and I have consolidated our changes into a
single
patchset to make it easier to review and merge.

During load balance, the scheduler evaluates the number of tasks that a
group
of CPUs can handle. The current method assumes that tasks have a fix load
of
SCHED_LOAD_SCALE and CPUs have a default capacity of SCHED_CAPACITY_SCALE.
This assumption generates wrong decision by creating ghost cores or by

I don't know the history, could you explain what's the meaning of 'ghost
cores' ?
The capacity_factor gives the number of tasks that can be handled by a
group of CPUs by dividing the group's capacity by SCHED_CAPACITY_SCALE

For a system with SMT, the default capacity of a core is 1178 so the
capacity of each CPU for a dual threads per core is 589.

At CPU level we have a capacity_factor of 1 = div_round_closest(589, 1024)
At core level we still have a capacity_factor of 1 =
div_round_closest(1178, 1024). This is a intended behavior to promote
1 task per core
Then, if we have 4 cores in a node, the capacity_factor is 5 =
div_round_closest(4712, 1024) whereas we should have 4. So a 5th ghost
core has appeared in the group and the load balancer will not
considered the group as overloaded if there is 5 tasks whereas it
should in order to try to move this 5th task on an idle core (if there
is one)
Patch [0] solves some use cases by ensuring that we will not have more
cores than possible so we can't have more than 4 core for the previous
example.
Now, if some RT tasks are running and using almost 1 core (1024 as an
example), the capacity_factor is still 4 = div_round_closest(3688,
1024) whereas a core is nearly fully used and the capacity_factor
should be 3

[0] https://lkml.org/lkml/2013/8/28/194

Got it, thanks for your great explanation.

Regards,
Wanpeng Li


Regards,
Vincent

Regards,
Wanpeng Li


removing real ones when the original capacity of CPUs is different from
the
default SCHED_CAPACITY_SCALE. With this patch set, we don't try anymore to
evaluate the number of available cores based on the group_capacity but
instead
we evaluate the usage of a group and compare it with its capacity.

This patchset mainly replaces the old capacity_factor method by a new one
and
keeps the general policy almost unchanged. These new metrics will be also
used
in later patches.

[snip]

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/