Re: [PATCH] sched: Simplify cpu-hot-unplug task migration

From: Oleg Nesterov
Date: Thu Nov 18 2010 - 09:12:38 EST


On 11/17, Peter Zijlstra wrote:
>
> On Wed, 2010-11-17 at 20:27 +0100, Oleg Nesterov wrote:
>
> > > -static void migrate_dead_tasks(unsigned int dead_cpu)
> > > -{
> > > - struct rq *rq = cpu_rq(dead_cpu);
> > > - struct task_struct *next;
> > > + rq->stop = NULL;
> >
> > (or we could do current->state = TASK_INTERRUPTIPLE, afaics)
>
> Ah, you missed a patch that made pick_next_task_stop() look like:
>
> static struct task_struct *pick_next_task_stop(struct rq *rq)
> {
> struct task_struct *stop = rq->stop;
>
> if (stop && stop->se.on_rq)

Yes, thanks.

> > > for ( ; ; ) {
> > > - if (!rq->nr_running)
> > > + /*
> > > + * There's this thread running, bail when that's the only
> > > + * remaining thread.
> > > + */
> > > + if (rq->nr_running == 1)
> > > break;
> >
> > I was very much confused, and I was going to say this is wrong.
> > However, now I think this is correct, just the comment is not
> > right.
> >
> > There is another running thread we should not migrate, rq->idle.
> > If nothing else, dequeue_task_idle() should be never called.
>
> In fact, dequeue_task_idle() will yell if you try that ;-)
>
> > But, if I understand correctly, ->nr_running does not account
> > the idle thread, and this is what makes this correct.
> >
> > Correct?
>
> Right, I can add: (the idle thread is not counted in nr_running), if
> that makes things clearer for you; however its a quite fundamental
> property,

Yes, I see now.

OK, this also explains my previous questions. I greatly misunderstood
this "small detail", starting from your initial patch. Every time I
thought you are trying to migrate rq->idle as well.

Thanks Peter. Only one question,

> @@ -253,9 +246,12 @@ static int __ref _cpu_down(unsigned int
> }
> BUG_ON(cpu_online(cpu));
>
> - /* Wait for it to sleep (leaving idle task). */
> - while (!idle_cpu(cpu))
> - yield();
> + /*
> + * The migration_call() CPU_DYING callback will have removed all
> + * runnable tasks from the cpu, there's only the idle task left now
> + * that the migration thread is done doing the stop_machine thing.
> + */
> + BUG_ON(!idle_cpu(cpu));

I am not sure.

Yes, we know for sure rhat the only runnable task is rq->idle.
But only after migration thread calls schedule() and switches to the
idle thread.

However, I see nothing which can guarantee this. Migration thread
running on the dead cpu wakes up the caller of stop_cpus() before
it calls schedule(), _cpu_down() can check rq->curr before it was
changed.

No?



Hmm. In fact, I think it is possible that cpu_stopper_thread() can
have more cpu_stop_work's queued when __stop_machine() returns.
This has nothing to do with this patch, but I think it makes sense
to clear stopper->enabled at CPU_DYING stage as well (of course,
this needs a separate patch).

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/