Re: [PATCH] active_load_balance() deadlock

From: Ingo Molnar
Date: Tue Jun 01 2004 - 15:47:06 EST



* Linus Torvalds <torvalds@xxxxxxxx> wrote:

> On Tue, 1 Jun 2004, Bjorn Helgaas wrote:
> >
> > active_load_balance() looks susceptible to deadlock when busiest==rq.
> > Without the following patch, my 128-way box deadlocks consistently
> > during boot-time driver init.
>
> Makes sense. The regular "load_balance()" already has that test,
> although it also makes it a WARN_ON() for some unexplained reason (I
> assume find_busiest_group() isn't supposed to find the local group,
> although it doesn't seem to be documented anywhere).
>
> Ingo, Andrew?

looks good to me. The condition is 'impossible', but the whole balancing
code is (intentionally) a bit racy:

cpus_and(tmp, group->cpumask, cpu_online_map);
if (!cpus_weight(tmp))
goto next_group;

for_each_cpu_mask(i, tmp) {
if (!idle_cpu(i))
goto next_group;
push_cpu = i;
}

rq = cpu_rq(push_cpu);
double_lock_balance(busiest, rq);
move_tasks(rq, push_cpu, busiest, 1, sd, IDLE);

in the for_each_cpu_mask() loop we specifically check for each CPU in
the target group to be idle - so push_cpu's runqueue == busiest [==
current runqueue] cannot be true because the current CPU is not idle, we
are running in the migration thread ... But this is not a real problem,
load-balancing we do in a racy way to reduce overhead [and it's all
statistics anyway so absolute accuracy is impossible], and active
balancing itself is somewhat racy due to the migration-thread wakeup
(and the active_balance flag) going outside the runqueue locks [for
similar reasons].

so it all looks quite plausible - the normal SMP boxes dont trigger it,
but Bjorn's 128-CPU setup with a non-trivial domain hiearachy triggers
it.

Signed-off-by: Ingo Molnar <mingo@xxxxxxx>

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/