Re: [PATCH 3/3] sched: start stopper early

From: Oleg Nesterov
Date: Fri Oct 09 2015 - 12:53:04 EST


On 10/09, Oleg Nesterov wrote:
>
> From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>

Peter, I tried to compromise you.

> case CPU_ONLINE:
> + stop_machine_unpark(cpu);
> /*
> * At this point a starting CPU has marked itself as online via
> * set_cpu_online(). But it might not yet have marked itself
> @@ -5337,7 +5340,7 @@ static int sched_cpu_active(struct notifier_block *nfb,
> * Thus, fall-through and help the starting CPU along.
> */
> case CPU_DOWN_FAILED:
> - set_cpu_active((long)hcpu, true);
> + set_cpu_active(cpu, true);

On a second thought, we can't do this (and your initial change has
the same problem).

We can not wakeup it before set_cpu_active(). This can lead to the
same problem fixed by dd9d3843755da95f6 "sched: Fix cpu_active_mask/
cpu_online_mask race". The stopper thread can hit
BUG_ON(td->cpu != smp_processor_id()) in smpboot_thread_fn().

Easy to fix, CPU_ONLINE should do set_cpu_active() itself and not
fall through to CPU_DOWN_FAILED,

case CPU_ONLINE:
set_cpu_active(cpu, true);
stop_machine_unpark(cpu);
break;

But. This is another proof that stop_two_cpus() must not rely on
cpu_active().

Right?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/