Re: [PATCH] stop on cpu lost

From: Randy.Dunlap
Date: Thu Jun 22 2006 - 11:41:06 EST


On Thu, 22 Jun 2006 10:08:48 -0500 Nathan Lynch wrote:

> Andrew Morton wrote:
> > KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
> > >
> > > Now, when a task loses all of its allowed cpus because of cpu hot removal,
> > > it will be foreced to migrate to not-allowed cpus.
> > >
> > > In this case, the task is not properly reconfigurated by a user before
> > > cpu-hot-removal. Here, the task (and system) is in a unexpeced wrong state.
> > > This migration is maybe one of realistic workarounds. But sometimes it will be
> > > harmfull.
> > > (stealing other cpu time, making bugs in thread controllers, do some unexpected
> > > execution...)
> > >
> > > This patch adds sysctl "sigstop_on_cpu_lost". When sigstop_on_cpu_lost==1,
> > > a task which losts is cpu will be stopped by SIGSTOP.
> > > Depends on system management policy, mis-configurated applications are stopped.
> > >
> >
> > Well that's a pretty unpleasant patch, isn't it?
> >
> > But I guess it's policy, and if we cannot think of anything better then we'll
> > have to do it this way :(
>
> I tend to favor not changing the kernel to handle this case. We're
> already making a best effort attempt to handle conflicting directives
> from the admin. This is a policy that can be implemented in userspace
> without much trouble.
>
> If we really want to keep the admin shooting himself in the foot,
> wouldn't it be preferable to fail the offline operation if there are
> user tasks exclusively bound to the cpu?

Sounds much better than just killing the process.

> While we're on the subject, what if there are interrupts bound to the
> cpu you want to offline? Should we consider handling that case
> differently as well?


---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/