Re: [PATCH v1 00/19] Increase resolution of load weights

From: Nikhil Rao
Date: Tue May 10 2011 - 20:14:54 EST


On Thu, May 5, 2011 at 11:59 PM, Ingo Molnar <mingo@xxxxxxx> wrote:
> * Nikhil Rao <ncrao@xxxxxxxxxx> wrote:
>> diff --git a/kernel/sched.c b/kernel/sched.c
>> index f4b4679..3dae6c5 100644
>> --- a/kernel/sched.c
>> +++ b/kernel/sched.c
>> @@ -293,7 +293,7 @@ static DEFINE_SPINLOCK(task_group_lock);
>> Â * Âlimitation from this.)
>> Â */
>> Â#define MIN_SHARES Â 2
>> -#define MAX_SHARES Â (1UL << 18)
>> +#define MAX_SHARES Â (1UL << (18 + SCHED_LOAD_RESOLUTION))
>>
>> Âstatic int root_task_group_load = ROOT_TASK_GROUP_LOAD;
>> Â#endif
>> @@ -1307,14 +1307,18 @@ calc_delta_mine(unsigned long delta_exec, u64
>> weight, struct load_weight *lw)
>> Â Â Â u64 tmp;
>>
>> Â Â Â if (!lw->inv_weight) {
>> - Â Â Â Â Â Â if (BITS_PER_LONG > 32 && unlikely(lw->weight >= WMULT_CONST))
>> + Â Â Â Â Â Â unsigned long w = scale_down_load_resolution(lw->weight);
>> + Â Â Â Â Â Â if (BITS_PER_LONG > 32 && unlikely(w >= WMULT_CONST))
>> Â Â Â Â Â Â Â Â Â Â Â lw->inv_weight = 1;
>> Â Â Â Â Â Â Â else
>> - Â Â Â Â Â Â Â Â Â Â lw->inv_weight = 1 + (WMULT_CONST-lw->weight/2)
>> - Â Â Â Â Â Â Â Â Â Â Â Â Â Â / (lw->weight+1);
>> + Â Â Â Â Â Â Â Â Â Â lw->inv_weight = 1 + (WMULT_CONST - w/2) / (w + 1);
>> Â Â Â }
>>
>> - Â Â tmp = (u64)delta_exec * weight;
>> + Â Â if (likely(weight > (1UL << SCHED_LOAD_RESOLUTION)))
>> + Â Â Â Â Â Â tmp = (u64)delta_exec * scale_down_load_resolution(weight);
>> + Â Â else
>> + Â Â Â Â Â Â tmp = (u64)delta_exec;
>
> Couldnt the compiler figure out that on 32-bit, this:
>
>> + Â Â Â Â Â Â tmp = (u64)delta_exec * scale_down_load_resolution(weight);
>
> is equivalent to:
>
>> + Â Â Â Â Â Â tmp = (u64)delta_exec;
>
> ?
>
> I.e. it would be nice to check whether a reasonably recent version of GCC
> figures out this optimization by itself - in that case we can avoid the
> branching ugliness, right?
>

We added the branch to take care of the case when weight < 1024 (i..e
2^SCHED_LOAD_RESOLUTION). We downshift the weight by 10 bits so that
we do not lose accuracy/performance in calc_delta_mine(), and try to
keep the mult within 64-bits. Task groups with low shares values can
have sched entities with weight less than 1024 since MIN_SHARES is
still 2 (we don't scale that up). To prevent scaling down weight to 0,
we add this check and force a lower bound of 1.

I think we need the branch for 64-bit kernels. I don't like the branch
but I can't think of a better way to avoid it. Do you have any
suggestion?

For 32-bit systems, the compiler should ideally optimize this branch
away. Unfortunately gcc-4.4.3 doesn't do that (and I'm not sure if a
later version does it either). We could add a macro around this check
to avoid the branch for 32-bit and do the check for 64-bit kernels?

> Also, the above (and the other scale-adjustment changes) probably explains why
> the instruction count went up on 64-bit.

Yes, that makes sense. We see an increase in instruction count of
about 2% with the new version of the patchset, down from 5.8% (will
post the new patchset soon). Assuming 30% of the cost of pipe test is
scheduling, that is an effective increase of approx. 6.7%. I'll post
the data and some analysis along with the new version.

>
>> @@ -1758,12 +1762,13 @@ static void set_load_weight(struct task_struct *p)
>> Â Â Â Â* SCHED_IDLE tasks get minimal weight:
>> Â Â Â Â*/
>> Â Â Â if (p->policy == SCHED_IDLE) {
>> - Â Â Â Â Â Â p->se.load.weight = WEIGHT_IDLEPRIO;
>> + Â Â Â Â Â Â p->se.load.weight = scale_up_load_resolution(WEIGHT_IDLEPRIO);
>> Â Â Â Â Â Â Â p->se.load.inv_weight = WMULT_IDLEPRIO;
>> Â Â Â Â Â Â Â return;
>> Â Â Â }
>>
>> - Â Â p->se.load.weight = prio_to_weight[p->static_prio - MAX_RT_PRIO];
>> + Â Â p->se.load.weight = scale_up_load_resolution(
>> + Â Â Â Â Â Â Â Â Â Â prio_to_weight[p->static_prio - MAX_RT_PRIO]);
>> Â Â Â p->se.load.inv_weight = prio_to_wmult[p->static_prio - MAX_RT_PRIO];
>
> Please create a local 'load' variable that is equal to &p->se.load, and also
> create a 'prio = p->static_prio - MAX_RT_PRIO' variable.
>
> Furthermore, please rename 'scale_up_load_resolution' to something shorter:
> scale_load() is not used within the scheduler yet so it's free for taking.
>
> Then a lot of the above repetitious code could be written as a much nicer:
>
> Â Â Â Âload->weight = scale_load(prio_to_weight[prio]);
> Â Â Â Âload->inv_weight = prio_to_wmult[prio];
>
> ... and the logic becomes a *lot* more readable and the ugly linebreak is gone
> as well.
>
> Please make such a set_load_weight() cleanup patch separate from the main
> patch, so that your main patch can still be reviewed in separation.
>

Sure, will do.

-Thanks,
Nikhil
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/