Re: [discuss] Re: [patch 0/4] CPU hot-plug support for x86_64

From: Andi Kleen
Date: Tue May 24 2005 - 06:50:29 EST


On Mon, May 23, 2005 at 10:40:46AM -0700, Ashok Raj wrote:
> On Mon, May 23, 2005 at 07:12:12PM +0200, Andi Kleen wrote:
> > > The only other workable alternate would be to use the stop_machine()
> > > like thing which we use to automically update cpu_online_map. This means we
> > > execute a high priority thread on all cpus, bringing the system to knees before
> >
> > That is not nice agreed.
> >
> > > just adding a new cpu. On very large systems this will definitly be
> > > visible.
> >
> > I still dont quite get it why it is not enough to keep interrupts
> > off until the CPU enters idle. Currently we enable them shortly
> > in the middle of the initialization (whcih is already dangerous
> > because interrupts can see half initialized state like out of date TSC),
> > but I hope to get rid of that soon too. With the full startup
> > in CLI would you problems be gone?
> >
>
> I think so, if we can ensure none is delivered to the partially up cpu
> we probably are covered.

You mean not delivered to its APIC or not delivered as an visible
interrupt in the instruction stream?

The later can be ensured, the first not. I guess if the first is a problem
you could add a function to ack all pending interrupts after initial sti.

e.g. we can assume the CPU will deliver everything pending after two
instruction after the sti and when there are interrupts left in the APIC
you can ack them. But why would they not be raised as real interruptions
at this point anyways?


> Iam not a 100% sure about above either, if the smp_call_function
> is started with 3 cpus initially, and 1 just came up, the counts in
> the smp_call data struct could be set to 3 as a result of the new cpu
> received this broadcast as well, and we might quit earlier in the wait.

In the worst case a smp_call_function would be delayed for the whole
boot up time of a new CPU which should be quite bounded. The longest
delay in there is probably the bogomips calibrate, but I believe
Venkatesh recently sped that up greatly anyways so it should not be
an issue anymore. If the delay is < 1s that is probably tolerable.

Or do I miss some shade of the problem you are worried about?

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/