Re: [BUG] CFS vs cpu hotplug

From: Ingo Molnar
Date: Sun Jun 29 2008 - 02:57:10 EST



* Dmitry Adamushko <dmitry.adamushko@xxxxxxxxx> wrote:

> Hello,
>
> it seems to be related to migrate_dead_tasks().
>
> Firstly I added traces to see all tasks being migrated with
> migrate_live_tasks() and migrate_dead_tasks(). On my setup the problem
> pops up (the one with "se == NULL" in the loop of
> pick_next_task_fair()) shortly after the traces indicate that some has
> been migrated with migrate_dead_tasks()). btw., I can reproduce it
> much faster now with just a plain cpu down/up loop.
>
> [disclaimer] Well, unless I'm really missing something important in
> this late hour [/desclaimer] pick_next_task() is not something
> appropriate for migrate_dead_tasks() :-)
>
> the following change seems to eliminate the problem on my setup
> (although, I kept it running only for a few minutes to get a few
> messages indicating migrate_dead_tasks() does move tasks and the
> system is still ok)
>
> [ quick hack ]
>
> @@ -5887,6 +5907,7 @@ static void migrate_dead_tasks(unsigned int dead_cpu)
> next = pick_next_task(rq, rq->curr);
> if (!next)
> break;
> + next->sched_class->put_prev_task(rq, next);
> migrate_dead(dead_cpu, next);
>

thanks Dmitry - i've applied this chunk to tip/master and
tip/sched/urgent, for more testing.

if this turns out to be the final and full fix today, would you mind to
submit the rest of your checks as well? It seems like a rather sensible
set of sanity checks. Put under CONFIG_SCHED_DEBUG or a new
(default-off) config option.

it would also be _very_ nice to have a built-in cpu hotplug tester in
the kernel, a'ka CONFIG_RCU_TORTURE_TEST=y. There's already sample code
in kernel/tracing/ of how to initiate hotplug events from within the
kernel.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/