Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

From: bsegall
Date: Fri Sep 11 2015 - 13:06:04 EST


Morten Rasmussen <morten.rasmussen@xxxxxxx> writes:

> On Fri, Sep 11, 2015 at 08:28:25AM +0800, Yuyang Du wrote:
>> On Thu, Sep 10, 2015 at 12:07:27PM +0200, Peter Zijlstra wrote:
>> > > > Still don't understand why it's a unit problem. IMHO LOAD/UTIL and
>> > > > CAPACITY have no unit.
>> > >
>> > > To be more accurate, probably, LOAD can be thought of as having unit,
>> > > but UTIL has no unit.
>> >
>> > But I'm thinking that is wrong; it should have one, esp. if we go scale
>> > the thing. Giving it the same fixed point unit as load simplifies the
>> > code.
>>
>> I think we probably are saying the same thing with different terms. Anyway,
>> let me reiterate what I said and make it a little more formalized.
>>
>> UTIL has no unit because it is pure ratio, the cpu_running%, which is in the
>> range of [0, 100%], and we increase the resolution, because we don't want
>> to lose many (due to integer rounding) by multiplying a number (say 1024), then
>> the range becomes [0, 1024].
>
> Fully agree, and with frequency invariance we basically scale running
> time to take into account that the cpu might be running slower that it
> is capable of at the highest frequency. With cpu invariance also scale
> by any difference their might be in max frequency and/or cpu
> micro-archiecture so utilization becomes comparable between cpus. One
> can also see it as we slow down or speed up time depending the current
> compute capacity of the cpu relative to the max capacity.
>
>> CAPACITY is also a ratio of ACTUAL_PERF/MAX_PERF, from (0, 1]. Even LOAD
>> is the same, a ratio of NICE_X/NICE_0, from [15/1024=0.015, 88761/1024=86.68],
>> as it only has relativity meaning (i.e., when comparing to each other).
>
> Fully agree. Though 'LOAD' is a somewhat overloaded term in the
> scheduler. Just to be clear, you refer to load.weight, load_avg is the
> multiplication of load.weight and the task runnable time ratio.
>
>> I said it has unit, it is in the sense that it looks like currency (for instance,
>> Yuan), you can use to buy CPU fair share. But it is just how you look at it and
>> there are certainly many other ways.
>>
>> So, I still propose to generalize all these with the following patch, in the
>> belief that this makes it simple and clear, and error-reducing.
>>
>> --
>>
>> Subject: [PATCH] sched/fair: Generalize the load/util averages resolution
>> definition
>>
>> A integer metric needs certain resolution to allow how much detail we
>> can look into (not losing detail by integer rounding), which also
>> determines the range of the metrics.
>>
>> For instance, to increase the resolution of [0, 1] (two levels), one
>> can multiply 1024 and get [0, 1024] (1025 levels).
>>
>> In sched/fair, a few metrics depend on the resolution: load/load_avg,
>> util_avg, and capacity (frequency adjustment). In order to reduce the
>> risks of making mistakes relating to resolution/range, we therefore
>> generalize the resolution by defining a basic resolution constant
>> number, and then formalize all metrics to depend on the basic
>> resolution. The basic resolution is 1024 or (1 << 10). Further, one
>> can recursively apply another basic resolution to increase the final
>> resolution (e.g., 1048676=1<<20).
>>
>> Signed-off-by: Yuyang Du <yuyang.du@xxxxxxxxx>
>> ---
>> include/linux/sched.h | 2 +-
>> kernel/sched/sched.h | 12 +++++++-----
>> 2 files changed, 8 insertions(+), 6 deletions(-)
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 119823d..55a7b93 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -912,7 +912,7 @@ enum cpu_idle_type {
>> /*
>> * Increase resolution of cpu_capacity calculations
>> */
>> -#define SCHED_CAPACITY_SHIFT 10
>> +#define SCHED_CAPACITY_SHIFT SCHED_RESOLUTION_SHIFT
>> #define SCHED_CAPACITY_SCALE (1L << SCHED_CAPACITY_SHIFT)
>>
>> /*
>> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>> index 68cda11..d27cdd8 100644
>> --- a/kernel/sched/sched.h
>> +++ b/kernel/sched/sched.h
>> @@ -40,6 +40,9 @@ static inline void update_cpu_load_active(struct rq *this_rq) { }
>> */
>> #define NS_TO_JIFFIES(TIME) ((unsigned long)(TIME) / (NSEC_PER_SEC / HZ))
>>
>> +# define SCHED_RESOLUTION_SHIFT 10
>> +# define SCHED_RESOLUTION_SCALE (1L << SCHED_RESOLUTION_SHIFT)
>> +
>> /*
>> * Increase resolution of nice-level calculations for 64-bit architectures.
>> * The extra resolution improves shares distribution and load balancing of
>> @@ -53,16 +56,15 @@ static inline void update_cpu_load_active(struct rq *this_rq) { }
>> * increased costs.
>> */
>> #if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power usage under light load */
>> -# define SCHED_LOAD_RESOLUTION 10
>> -# define scale_load(w) ((w) << SCHED_LOAD_RESOLUTION)
>> -# define scale_load_down(w) ((w) >> SCHED_LOAD_RESOLUTION)
>> +# define SCHED_LOAD_SHIFT (SCHED_RESOLUTION_SHIFT + SCHED_RESOLUTION_SHIFT)
>> +# define scale_load(w) ((w) << SCHED_RESOLUTION_SHIFT)
>> +# define scale_load_down(w) ((w) >> SCHED_RESOLUTION_SHIFT)
>> #else
>> -# define SCHED_LOAD_RESOLUTION 0
>> +# define SCHED_LOAD_SHIFT (SCHED_RESOLUTION_SHIFT)
>> # define scale_load(w) (w)
>> # define scale_load_down(w) (w)
>> #endif
>>
>> -#define SCHED_LOAD_SHIFT (10 + SCHED_LOAD_RESOLUTION)
>> #define SCHED_LOAD_SCALE (1L << SCHED_LOAD_SHIFT)
>>
>> #define NICE_0_LOAD SCHED_LOAD_SCALE
>
> I think this is pretty much the required relationship between all the
> SHIFTs and SCALEs that Peter checked for in his #if-#error thing
> earlier, so no disagreements from my side :-)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

SCHED_LOAD_RESOLUTION and the non-SLR part of SCHED_LOAD_SHIFT are not
required to be the same value and should not be conflated.

In particular, since cgroups are on the same timeline as tasks and their
shares are not scaled by SCHED_LOAD_SHIFT in any way (but are scaled so
that SCHED_LOAD_RESOLUTION is invisible), changing that part of
SCHED_LOAD_SHIFT would cause issues, since things can assume that nice-0
= 1024. However changing SCHED_LOAD_RESOLUTION would be fine, as that is
an internal value to the kernel.

In addition, changing the non-SLR part of SCHED_LOAD_SHIFT would require
recomputing all of prio_to_weight/wmult for the new NICE_0_LOAD.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/