Max Krasnyansky wrote:Greg, correct me if I'm wrong but we seem to have exact same issue with the rq->rq->online map. Lets take "cpu going down" for
example. We're clearing rq->rd->online bit on DYING event, but
nothing AFAICS prevents another cpu calling
rebuild_sched_domains()->partition_sched_domains() in the middle of
the hotplug sequence. partition_sched_domains() will happily reset
rd->rq->online mask and things will fail. I'm talking about this
path
__build_sched_domains() -> cpu_attach_domain() -> rq_attach_root()
...
cpu_set(rq->cpu, rd->span);
if (cpu_isset(rq->cpu, cpu_online_map))
set_rq_online(rq);
...
I think you are right, but wouldn't s/online/active above fix that as well? The active_map didnt exist at the time that code went in initially ;)
Ok. I'll keep an eye on it.--I think things were converted where they made sense to convert. But we also had a different goal at that time, so perhaps something was missed. If you think something else should be converted, please point it out.
btw Why didn't we convert sched*.c to use rq->rd->online when it was
introduced ? ie Instead of using cpu_online_map directly.
In the meantime, I would suggest we consider this patch on top of yours (applies to tip/sched/devel):
----------------------
sched: Fully integrate cpus_active_map and root-domain code
Signed-off-by: Gregory Haskins <ghaskins@xxxxxxxxxx>