Re: [PATCH] cpu hotplug, sched:Introduce cpu_active_map and redoscheddomainmanagment(take 2)

From: Max Krasnyansky
Date: Tue Jul 22 2008 - 15:33:17 EST

Gregory Haskins wrote:
Max Krasnyansky wrote:
Greg, correct me if I'm wrong but we seem to have exact same issue with the rq->rq->online map. Lets take "cpu going down" for
example. We're clearing rq->rd->online bit on DYING event, but
nothing AFAICS prevents another cpu calling
rebuild_sched_domains()->partition_sched_domains() in the middle of
the hotplug sequence. partition_sched_domains() will happily reset
rd->rq->online mask and things will fail. I'm talking about this

__build_sched_domains() -> cpu_attach_domain() -> rq_attach_root()
cpu_set(rq->cpu, rd->span);
if (cpu_isset(rq->cpu, cpu_online_map))

I think you are right, but wouldn't s/online/active above fix that as well? The active_map didnt exist at the time that code went in initially ;)

Actually after a bit more thinking :) I realized that the scenario I explained above cannot happen because partition_sched_domains() must be called under get_online_cpus() and the set_rq_online() happens in the hotplug writer's path (ie under cpu_hotplug.lock). Since I unified all the other domain rebuild paths (arch_reinit_sched_domains, etc) we should be safe. But it again means we'd rely on those intricate dependencies that we wanted to avoid with the cpu_active_map. Also cpusets might still need to rebuild the domains in the hotplug writer's path.
So it's better to fix it once and for all :)


btw Why didn't we convert sched*.c to use rq->rd->online when it was
introduced ? ie Instead of using cpu_online_map directly.
I think things were converted where they made sense to convert. But we also had a different goal at that time, so perhaps something was missed. If you think something else should be converted, please point it out.
Ok. I'll keep an eye on it.

In the meantime, I would suggest we consider this patch on top of yours (applies to tip/sched/devel):


sched: Fully integrate cpus_active_map and root-domain code
Signed-off-by: Gregory Haskins <ghaskins@xxxxxxxxxx>

The only thing I'm a bit unsure of is the error scenarios in the cpu hotplug event sequence. online_map is not cleared when something in the notifier chain fails, but active_map is.

