Re: [PATCH] stop on cpu lost

From: Nick Piggin
Date: Thu Jun 22 2006 - 15:56:02 EST


Hugh Dickins wrote:
On Fri, 23 Jun 2006, Nick Piggin wrote:

Hugh Dickins wrote:

I'd expect tasks bound to the unplugged cpu simply not to be run
until "that" cpu is plugged back in.

Yes, I don't see why swsusp tasks would need to be migrated and
run. OTOH, this would require more swsusp special casing, but
apparently that's encouraged ;)


No, I wasn't meaning any swsusp special casing at all.

I was just using Pavel's swsusp-related mail as the hook to raise
the point that had been haunting me with every earlier mail on
this subject, mails I'd already deleted.

Pavel seemed to imply overriding the requested affinity for tasks
(in preferring #1 migration), I doubted he really wanted that.

No, but it is currently the only way to do it.

What I had thought you meant was to disallow cpu unplugging,
except with the special case to allow it from swsusp when
suspending the system.



With proviso that it should be possible to "kill -9" such a task
i.e. it be allowed to run in kernel on a wrong cpu just to exit.

Presumably this is difficult, because unplugging a cpu will also
remove infrastructure which would, for example, allow "ps" to show
such tasks. Perhaps such infrastructure should remain so long as
there are tasks there.

They'll be in the global tasklist, so there should be no reason why
they couldn't be migrated over to an online CPU with taskset. Shouldn't
require any rewrites, IIRC.


I was afraid that "for_each_online_cpu"-type scans would skip over
the unplugged cpus, in such a way that the homeless tasks might be
awkwardly invisible in some contexts. If no such problem, fine.

The management stuff tends to go via the pid hashes or the global
tasklist rather than the runqueues. But you might be right that
there would be some corner cases.



But after swsusp comes back up, it will be bringing up the same number
of CPUs as went down, won't it? So you shouldn't get into that
situation where you'd need to kill stuff, should you?


I wasn't meaning "kill -9" for the swsusp case, but for the general
unplug cpu case. We have a number of homeless tasks, which the admin
might want to run again when "the" cpu is plugged back in; or might
want to kill off without having to plug a cpu back in.

Possible maybe... I presumed that would lead to a nightmare of
resource deadlocks (think mutexes). I'd hoped it could still
be useful for the swsusp case where everything gets turned off
at once, though. But I could be wrong...

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com -
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/