RE: Regression seen for patch "sched:dont decrease idle sleep avg"

From: Chen, Kenneth W
Date: Wed May 17 2006 - 15:34:05 EST


Con Kolivas wrote on Wednesday, May 17, 2006 1:23 AM
> On Wednesday 17 May 2006 09:32, Tim Chen wrote:
> > It seems like just one sleep longer than INTERACTIVE_SLEEP is needed
> > kick the priority of a process all the way to MAX_BONUS-1 and boost the
> > sleep_avg, regardless of what the prior sleep_avg was.
> >
> > So if there is a cpu hog that has long sleeps occasionally, once it woke
> > up, its priority will get boosted close to maximum, likely starving out
> > other processes for a while till its sleep_avg gets reduced. This
> > behavior seems like something to avoid according to the original code
> > comment. Are we boosting the priority too quickly?
>
> Two things strike me here. I'll explain them in the patch below.
>
> How's this look?
> ---
> The relationship between INTERACTIVE_SLEEP and the ceiling is not perfect
> and not explicit enough. The sleep boost is not supposed to be any larger
> than without this code and the comment is not clear enough about what exactly
> it does, just the reason it does it.
>
> There is a ceiling to the priority beyond which tasks that only ever sleep
> for very long periods cannot surpass.


It looks bad. I don't like it. The priority boost is even more peculiar
in this patch.


> --- linux-2.6.17-rc4-mm1.orig/kernel/sched.c 2006-05-17 15:57:49.000000000 +1000
> +++ linux-2.6.17-rc4-mm1/kernel/sched.c 2006-05-17 18:19:29.000000000 +1000
> @@ -904,20 +904,14 @@ static int recalc_task_prio(task_t *p, u
> }
>
> if (likely(sleep_time > 0)) {
> - /*
> - * User tasks that sleep a long time are categorised as
> - * idle. They will only have their sleep_avg increased to a
> - * level that makes them just interactive priority to stay
> - * active yet prevent them suddenly becoming cpu hogs and
> - * starving other processes.
> - */
> - if (p->mm && sleep_time > INTERACTIVE_SLEEP(p)) {
> - unsigned long ceiling;
> + unsigned long ceiling = INTERACTIVE_SLEEP(p);
>
> - ceiling = JIFFIES_TO_NS(MAX_SLEEP_AVG -
> - DEF_TIMESLICE);
> - if (p->sleep_avg < ceiling)
> - p->sleep_avg = ceiling;
> + if (p->mm && sleep_time > ceiling && p->sleep_avg < ceiling) {
> + /*
> + * Prevents user tasks from achieving best priority
> + * with one single large enough sleep.
> + */
> + p->sleep_avg = ceiling;


The assignment of p->sleep_avg = ceiling doesn't make much logical sense.
Because INTERACTIVE_SLEEP is scaled proportionally with nice value, e.g.
the lower the nice value, the lower the interactive_sleep. However, priority
calculation is inverse of p->sleep_avg, e.g. the smaller the sleep_avg, the
smaller the bonus, thus the higher dynamic priority.

Take one concrete example: for a prolonged sleep, say 1 second, nice(-10)
will have a priority boost of 4 while nice(0) will have a priority boost of
9. The ceiling algorithm looks like is reversed. I would think kernel should
at least enforce same ceiling value independent of nice value.

- Ken
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/