Re: [PATCH] stopmachine: add stopmachine_timeout

From: Rusty Russell
Date: Sun Jul 20 2008 - 05:46:32 EST


On Wednesday 16 July 2008 14:05:31 Hidetoshi Seto wrote:
> Hi Rusty,
>
> Rusty Russell wrote:
> > On Tuesday 15 July 2008 11:11:34 Hidetoshi Seto wrote:
> >> However we need to be careful that the stuck CPU can restart
> >> unexpectedly.
> >
> > OK, if you are worried about that race, I think we can still fix it...
>
> After having a relaxing day, once I said:
> "I like your idea that if we did not want to do something on the stuck CPU
> then treat the CPU as stopped."
> but now I noticed that the stuck CPU can harm what we want to do if it is
> not real stuck... ex. busy loop in a subsystem, and we want to touch the
> core of the subsystem exclusively.

No. You aim for perfection, but there is no "right" answer other than "don't
get your system into this mess". Whatever we do is going to be an educated
guess. And guessing that there'll be no race is a very good guess indeed.

The scenario we are addressing is a stuck CPU and module load. If we fail
stop machine, module load fails.

That is why we should continue if we can. It is also why the default timeout
cannot be 0. You can't turn this on once you notice there's a problem: it's
too late.

If we don't want to handle this case, let's not apply any patch at all.
Rusty.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/