Re: [PATCH] stop on cpu lost

From: Randy.Dunlap
Date: Thu Jun 22 2006 - 12:20:21 EST


On Fri, 23 Jun 2006 01:05:50 +0900 KAMEZAWA Hiroyuki wrote:

> On Thu, 22 Jun 2006 08:45:55 -0700 (PDT)
> Christoph Lameter <clameter@xxxxxxx> wrote:
>
> > On Thu, 22 Jun 2006, Randy.Dunlap wrote:
> >
> > > Sounds much better than just killing the process.
> >
> > Right and having active interrupts or devices using that processor should
> > also stop offlining a processor.
> >
> > So just remove everything from a processor before offlining. If you cannot
> > remove all users then the processor cannot be offlined.
> >
> Hm..
> Then, there is several ways to manage this sitation.
>
> 1. migrate all even if it's not allowed by users
> 2. kill mis-configured tasks.

I would claim that the tasks are not misconfigured,
but that the admin misconfigured the hardware (CPU).

> 3. stop ...
> 4. cancel cpu-hot-removal.
>
> I just don't like "1".

I like it better than 2.

> I discussed this problem with my colleagues before sending patch,
> one said "4" seems regular way but another said "4" is bad.
>
> I sent a patch for "4" in the first place but Andi Kleen said it's bad.
> As he said, I'm handling the problem for which I can't find a good answer :(
>
> my point is that "1" is bad.

Sounds like we are getting nowhere. The sysctl knob might
have to be the answer.

---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/