Re: [PATCH] stop on cpu lost

From: Nathan Lynch
Date: Thu Jun 22 2006 - 11:07:45 EST


Andrew Morton wrote:
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
> >
> > Now, when a task loses all of its allowed cpus because of cpu hot removal,
> > it will be foreced to migrate to not-allowed cpus.
> >
> > In this case, the task is not properly reconfigurated by a user before
> > cpu-hot-removal. Here, the task (and system) is in a unexpeced wrong state.
> > This migration is maybe one of realistic workarounds. But sometimes it will be
> > harmfull.
> > (stealing other cpu time, making bugs in thread controllers, do some unexpected
> > execution...)
> >
> > This patch adds sysctl "sigstop_on_cpu_lost". When sigstop_on_cpu_lost==1,
> > a task which losts is cpu will be stopped by SIGSTOP.
> > Depends on system management policy, mis-configurated applications are stopped.
> >
>
> Well that's a pretty unpleasant patch, isn't it?
>
> But I guess it's policy, and if we cannot think of anything better then we'll
> have to do it this way :(

I tend to favor not changing the kernel to handle this case. We're
already making a best effort attempt to handle conflicting directives
from the admin. This is a policy that can be implemented in userspace
without much trouble.

If we really want to keep the admin shooting himself in the foot,
wouldn't it be preferable to fail the offline operation if there are
user tasks exclusively bound to the cpu?

While we're on the subject, what if there are interrupts bound to the
cpu you want to offline? Should we consider handling that case
differently as well?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/