Re: [PATCH] stop on cpu lost
From: Nathan Lynch
Date: Thu Jun 22 2006 - 13:03:38 EST
Randy.Dunlap wrote:
> On Fri, 23 Jun 2006 01:05:50 +0900 KAMEZAWA Hiroyuki wrote:
>
> > On Thu, 22 Jun 2006 08:45:55 -0700 (PDT)
> > Christoph Lameter <clameter@xxxxxxx> wrote:
> >
> > > On Thu, 22 Jun 2006, Randy.Dunlap wrote:
> > >
> > > > Sounds much better than just killing the process.
> > >
> > > Right and having active interrupts or devices using that processor should
> > > also stop offlining a processor.
> > >
> > > So just remove everything from a processor before offlining. If you cannot
> > > remove all users then the processor cannot be offlined.
> > >
> > Hm..
> > Then, there is several ways to manage this sitation.
> >
> > 1. migrate all even if it's not allowed by users
> > 2. kill mis-configured tasks.
>
> I would claim that the tasks are not misconfigured,
> but that the admin misconfigured the hardware (CPU).
>
> > 3. stop ...
> > 4. cancel cpu-hot-removal.
> >
> > I just don't like "1".
>
> I like it better than 2.
>
> > I discussed this problem with my colleagues before sending patch,
> > one said "4" seems regular way but another said "4" is bad.
> >
> > I sent a patch for "4" in the first place but Andi Kleen said it's bad.
> > As he said, I'm handling the problem for which I can't find a good answer :(
> >
> > my point is that "1" is bad.
>
> Sounds like we are getting nowhere. The sysctl knob might
> have to be the answer.
I don't like having the kernel forcibly kill or stop tasks for this
case, regardless of whether the behavior is configurable. What I
originally meant to suggest was a sysctl knob which will control
whether the offline will fail in this situation. But I'm still more
inclined to leave the kernel's handling of this as it stands, since
this is policy that can be implemented in userspace.
We need to preserve the current behavior as the default, in any case.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/