Re: [rfc][patch] sched: remove smpnice
From: Peter Williams
Date: Fri Feb 10 2006 - 22:34:14 EST
Peter Williams wrote:
Andrew Morton wrote:
"Siddha, Suresh B" <suresh.b.siddha@xxxxxxxxx> wrote:
On Tue, Feb 07, 2006 at 03:36:17PM -0800, Andrew Morton wrote:
Suresh, Martin, Ingo, Nick and Con: please drop everything,
and test this:
From: Peter Williams <pwil3058@xxxxxxxxxxxxxx>
This is a modified version of Con Kolivas's patch to add "nice"
load balancing across physical CPUs on SMP systems.
I have couple of issues with this patch.
a) on a lightly loaded system, this will result in higher priority
job hopping around from one processor to another processor.. This is
because of the code in find_busiest_group() which assumes that
SCHED_LOAD_SCALE represents a unit process load and with nice_to_bias
calculations this is no longer true(in the presence of non nice-0 tasks)
My testing showed that 178.galgel in SPECfp2000 is down by ~10% when
run with nice -20 on a 4P(8-way with HT) system compared to a nice-0
This is a bit of a surprise. Surely, even with this mod, a task
shouldn't be moved if it's the only runnable one on its CPU. If it
isn't the only runnable one on its CPU, it's not actually on the CPU and
it's not cache hot then moving it to another (presumably) idle CPU
should be a gain?
Presumably the delay waiting for the current task to exit the CPU is
less than the time taken to move the task to the new CPU? I'd guess
that this means that the task about to be moved is either: a) higher
priority than the current task on the CPU and is waiting for it to be
preempted off or b) it's equal priority (or at least next one due to be
scheduled) to the current task, waiting for the current task to
surrender the CPU and that surrender is going to happen pretty quickly
due to the current task's natural behaviour?
After a little lie down :-), I now think that this problem has been
misdiagnosed and the actual problem is that movement of high priority
tasks on lightly loaded systems is supressed by this patch rather than
it causing such tasks to hop from CPU to CPU.
The reason that I think this is that the implementation of biased_load()
makes it a likely outcome. As a shortcut to converting weighted load to
biased load it assumes the the average weighted load per runnable task
is 1 (or, equivalently, that the average biased prio per runnable is
NICE_TO_BIAS_PRIO(p)) and this means that if there's only one task to
move and its nice value is less than zero (i.e. it's high priority) then
the biased load to be moved that is calculated will be smaller than that
task's bias_prio causing it to NOT be moved by move_tasks().
Do you have any direct evidence to support your "hopping" hypothesis or
is my hypothesis equally likely?
If my hypothesis holds there is a relatively simple fix that would
involve modifying biased_load() to take into account rq->prio_bias and
rq->nr_running during its calculations. Basically, in that function,
(wload * NICE_TO_BIAS_PRIO(0)) would be replaced by (wload *
rq->prio_bias / rq->nr_running) which would, in turn, create a
requirement for rq to be passed in as an argument.
If there is direct evidence of hopping then, because of my hypothesis
above, I would shift the suspicion from find_busiest_group() to
Peter Williams pwil3058@xxxxxxxxxxxxxx
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/