Re: [RFC][PATCH 00/18] Increase resolution of load weights

From: Nikhil Rao
Date: Thu Apr 28 2011 - 14:21:17 EST


On Thu, Apr 28, 2011 at 4:48 AM, Nikunj A. Dadhania
<nikunj@xxxxxxxxxxxxxxxxxx> wrote:
> On Thu, 28 Apr 2011 12:37:27 +0530, "Nikunj A. Dadhania" <nikunj@xxxxxxxxxxxxxxxxxx> wrote:
>> On Wed, 20 Apr 2011 13:51:19 -0700, Nikhil Rao <ncrao@xxxxxxxxxx> wrote:
>> > Hi All,
>> >
>> > I have attached an early version of a RFC patchset to increase resolution of
>> > sched entity load weights. This RFC introduces SCHED_LOAD_RESOLUTION which
>> > scales NICE_0_LOAD by a factor of 1024. The scaling is done internally and should
>> > be completely invisible to the user.
>> >
>> > Why do we need this?
>> > This extra resolution allows us to scale on two dimensions - number of cpus and
>> > the depth of hierarchies. It also allows for proper load balancing of low weight
>> > task groups (for eg., nice+19 on autogroup).
>> >
>> > One of the big roadblocks for increasing resolution is the use of unsigned long
>> > for load.weight, which on 32-bit architectures can overflow with ~48 max-weight
>> > sched entities. In this RFC we convert all uses of load.weight to u64. This is
>> > still a work-in-progress and I have listed some of the issues I am still
>> > investigating below.
>> >
>> > I would like to get some feedback on the direction of this patchset. Please let
>> > me know if there are alternative ways of doing this, and I'll be happy to
>> > explore them as well.
>> >
>> > The patchset applies cleanly to v2.6.39-rc4. It compiles for i386 and boots on
>> > x86_64. Beyond the basic checks, it has not been well tested yet.
>> >
>> > Major TODOs:
>> > - Detect overflow in update shares calculations (time * load), and set load_avg
>> > Â to maximum possible value (~0ULL).
>> > - tg->task_weight uses an atomic which needs to be updates to 64-bit on 32-bit
>> > Â machines. Might need to add a lock to protect this instead of atomic ops.
>> > - Check wake-affine math and effective load calculations for overflows.
>> > - Needs more testing and need to ensure fairness/balancing is not broken.
>> >
>> Hi Nikhil,
>>
>> I did a quick test for creating 600 cpu hog tasks with and without this
>> patches on a 16cpu machine(x86_64) and I am seeing some mis-behaviour:
>>
>> Base kernel - 2.6.39-rc4
>>
>> [root@krm1 ~]# time -p ./test
>> real 43.54
>> user 0.12
>> sys 1.05
>> [root@krm1 ~]#
>>
>> Base + patches
>>
>> [root@krm1 ~]# time -p ./test
>>
>> Takes almost infinity, after 2 minutes I see only 16 tasks created
>> viewed from another ssh session to the machine:
>>
> I could get this working using following patch, not sure if it has other
> implications though. With this, I am back to saner time values for
> creating 600 cpu hog tasks:
>
> [root@ ~]# time -p ./test
> real 45.02
> user 0.13
> sys 1.07
> [root@ ~]#
>

Nikunj,

Thanks for running the tests and identifying this issue. You are right
-- we need to scale the reference weight, else we end up with slices
that are 2^10 times the expected value. Thanks for the patch.

-Thanks,
Nikhil

> ===================================================================
> Â ÂFrom: Nikunj A. Dadhania <nikunj@xxxxxxxxxxxxxxxxxx>
>
> Â Âsched: calc_delta_mine - fix calculation
>
> Â ÂAll the calculations of inv_weight takes scaled down weight, while
> Â Âcalculating the tmp, weight is not scaled down by
> Â ÂSCHED_LOAD_RESOLUTION, which then will return big values because of
> Â Âwhich the sched_slice thinks that its not time to preempt the
> Â Âcurrent running task
>
> Â ÂSigned-off-by: Nikunj A. Dadhania <nikunj@xxxxxxxxxxxxxxxxxx>
>
> Index: kernel/sched.c
> ===================================================================
> --- kernel/sched.c.orig 2011-04-28 16:34:24.000000000 +0530
> +++ kernel/sched.c   Â2011-04-28 16:36:29.000000000 +0530
> @@ -1336,7 +1336,7 @@ calc_delta_mine(unsigned long delta_exec
> Â Â Â Â Â Â Â Â Â Â Â Âlw->inv_weight = 1 + (WMULT_CONST - w/2) / (w + 1);
> Â Â Â Â}
>
> - Â Â Â tmp = (u64)delta_exec * weight;
> + Â Â Â tmp = (u64)delta_exec * (weight >> SCHED_LOAD_RESOLUTION);
> Â Â Â Â/*
> Â Â Â Â * Check whether we'd overflow the 64-bit multiplication:
> Â Â Â Â */
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/