Re: current linux-2.6.git: cpusets completely broken
From: Dmitry Adamushko
Date: Mon Jul 14 2008 - 20:00:59 EST
2008/7/15 Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>:
>
> On Tue, 15 Jul 2008, Dmitry Adamushko wrote:
>>
>> cpu_clear(cpu, cpu_active_map); _alone_ does not guarantee that after
>> its completion, no new tasks can appear on (be migrated to) 'cpu'.
>
> But I think we should make it do that.
>
> I do realize that we "queue" processes, but that's part of the whole
> complexity. More importantly, the people who do that kind of asynchronous
> queueing don't even really care - *if* they cared about the process
> _having_ to show up on the destination core, they'd be waiting
> synchronously and re-trying (which they do).
>
> So by doing the test for cpu_active_map not at queuing time, but at the
> time when we actually try to do the migration,
> we can now also make that
> cpu_active_map be totally serialized.
>
> (Of course, anybody who clears the bit does need to take the runqueue lock
> of that CPU too, but cpu_down() will have to do that as it does the
> "migrate away live tasks" anyway, so that's not a problem)
The 'synchronization' point occurs even earlier - when cpu_down() ->
__stop_machine_run() gets called (as I described in my previous mail).
My point was that if it's ok to have a _delayed_ synchronization
point, having it not immediately after cpu_clear(cpu, cpu_active_map)
but when the "runqueue lock" is taken a bit later (as you pointed out
above) or __stop_machine_run() gets executed (which is a sync point,
scheduling-wise),
then we can implement the proper synchronization (hotplugging vs.
task-migration) with cpu_online_map (no need for cpu_active_map).
Note, currently, _not_ all places in the scheduler where an actual
migration (not just queuing requests) takes place do the test for
cpu_offline(). Instead, they (blindly) rely on the assumption that if
a cpu is available via sched-domains, then it's guaranteed to be
online (and can be migrated to).
Provided all those places had cpu_offline() (additionally) in place,
the bug which has been discussed in this thread would _not_ happen
and, moreover, we would _not_ need to do all the fancy "attach NULL
domains" sched-domain manipulations (which depend on DOWN_PREPARE,
DOWN and other hotpluging events). We would only need to rebuild
domains once upon CPU_DOWN (on success).
p.s. hope my point is more understandable now (or it's clear that I'm
missing something at this late hour :^)
>
> Linus
>
--
Best regards,
Dmitry Adamushko
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/