Re: [PATCH v10 00/11] sched: consolidation of CPU capacity and usage

From: Vincent Guittot
Date: Thu Apr 02 2015 - 03:31:21 EST

On 2 April 2015 at 03:47, Wanpeng Li <> wrote:
> Hi Vincent,
> On Fri, Feb 27, 2015 at 04:54:03PM +0100, Vincent Guittot wrote:
>>This patchset consolidates several changes in the capacity and the usage
>>tracking of the CPU. It provides a frequency invariant metric of the usage of
>>CPUs and generally improves the accuracy of load/usage tracking in the
>>scheduler. The frequency invariant metric is the foundation required for the
>>consolidation of cpufreq and implementation of a fully invariant load tracking.
>>These are currently WIP and require several changes to the load balancer
>>(including how it will use and interprets load and capacity metrics) and
>>extensive validation. The frequency invariance is done with
>>arch_scale_freq_capacity and this patchset doesn't provide the backends of
>>the function which are architecture dependent.
>>As discussed at LPC14, Morten and I have consolidated our changes into a single
>>patchset to make it easier to review and merge.
>>During load balance, the scheduler evaluates the number of tasks that a group
>>of CPUs can handle. The current method assumes that tasks have a fix load of
>>SCHED_LOAD_SCALE and CPUs have a default capacity of SCHED_CAPACITY_SCALE.
>>This assumption generates wrong decision by creating ghost cores or by
>>removing real ones when the original capacity of CPUs is different from the
>>default SCHED_CAPACITY_SCALE. With this patch set, we don't try anymore to
>>evaluate the number of available cores based on the group_capacity but instead
>>we evaluate the usage of a group and compare it with its capacity.
>>This patchset mainly replaces the old capacity_factor method by a new one and
>>keeps the general policy almost unchanged. These new metrics will be also used
>>in later patches.
>>The CPU usage is based on a running time tracking version of the current
>>implementation of the load average tracking. I also have a version that is
>>based on the new implementation proposal [1] but I haven't provide the patches
>>and results as [1] is still under review. I can provide change above [1] to
>>change how CPU usage is computed and to adapt to new mecanism.
> Is there performance data for this cpu capacity and usage improvement?

I don't have data for this version but i have published figures for
previous one.

This patchset consolidates the tracking of CPU usage and capacity for
all kind of arch and use case by improving the detection of overloaded
Regarding the perf bench on SMP system which goals is to use all
available CPU and computing capacity , we should not see perf
improvement but we will not see perf regression too.
The difference is noticeable in mid load use case or when rt task or
irq are involved

> Regards,
> Wanpeng Li
>>Change since V9
>> - add a dedicated patch for removing unused capacity_orig
>> - update some comments and fix typo
>> - change the condition for actively migrating task on CPU with higher capacity
>>Change since V8
>> - reorder patches
>>Change since V7
>> - add freq invariance for usage tracking
>> - add freq invariance for scale_rt
>> - update comments and commits' message
>> - fix init of utilization_avg_contrib
>> - fix prefer_sibling
>>Change since V6
>> - add group usage tracking
>> - fix some commits' messages
>> - minor fix like comments and argument order
>>Change since V5
>> - remove patches that have been merged since v5 : patches 01, 02, 03, 04, 05, 07
>> - update commit log and add more details on the purpose of the patches
>> - fix/remove useless code with the rebase on patchset [2]
>> - remove capacity_orig in sched_group_capacity as it is not used
>> - move code in the right patch
>> - add some helper function to factorize code
>>Change since V4
>> - rebase to manage conflicts with changes in selection of busiest group
>>Change since V3:
>> - add usage_avg_contrib statistic which sums the running time of tasks on a rq
>> - use usage_avg_contrib instead of runnable_avg_sum for cpu_utilization
>> - fix replacement power by capacity
>> - update some comments
>>Change since V2:
>> - rebase on top of capacity renaming
>> - fix wake_affine statistic update
>> - rework nohz_kick_needed
>> - optimize the active migration of a task from CPU with reduced capacity
>> - rename group_activity by group_utilization and remove unused total_utilization
>> - repair SD_PREFER_SIBLING and use it for SMT level
>> - reorder patchset to gather patches with same topics
>>Change since V1:
>> - add 3 fixes
>> - correct some commit messages
>> - replace capacity computation by activity
>> - take into account current cpu capacity
>>Morten Rasmussen (2):
>> sched: Track group sched_entity usage contributions
>> sched: Make sched entity usage tracking scale-invariant
>>Vincent Guittot (9):
>> sched: add utilization_avg_contrib
>> sched: remove frequency scaling from cpu_capacity
>> sched: make scale_rt invariant with frequency
>> sched: add per rq cpu_capacity_orig
>> sched: get CPU's usage statistic
>> sched: replace capacity_factor by usage
>> sched; remove unused capacity_orig from
>> sched: add SD_PREFER_SIBLING for SMT level
>> sched: move cfs task on a CPU with higher capacity
>> include/linux/sched.h | 21 ++-
>> kernel/sched/core.c | 15 +--
>> kernel/sched/debug.c | 12 +-
>> kernel/sched/fair.c | 366 +++++++++++++++++++++++++++++++-------------------
>> kernel/sched/sched.h | 15 ++-
>> 5 files changed, 271 insertions(+), 158 deletions(-)
>>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>the body of a message to majordomo@xxxxxxxxxxxxxxx
>>More majordomo info at
>>Please read the FAQ at
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at