Re: [ANNOUNCE] [PATCH] Node Hotplug Support

From: Keiichiro Tokunaga
Date: Sun May 09 2004 - 21:22:35 EST


On Fri, 07 May 2004 21:10:58 +0200
Andi Kleen <ak@xxxxxx> wrote:

> Keiichiro Tokunaga <tokunaga.keiich@xxxxxxxxxxxxxx> writes:
>
> Hallo,
> >
> > Where, "node" is a container device that contains CPU, memory,
> > and/or I/O devices. The definition of "node" is different from one
> > of "NUMA node".
>
> This does not really fit well into the Linux/Unix philosophy to do one thing
> with one mechanism and do it well, and keep policy in user space.

Policy could be moved in user space as you mentioned.

> Better would be to do individual mechanisms to hot plug these various
> things; and if you feel the need to combine them e.g. for administrative
> purposes do this in user space.

"node hotplug" of LHNS is not just a controller to invoke CPU,
memory, IO hotplug with a policy. It does hotplug a container
device also. When a container device is attached to the system,
it should be detected by someone. Node hotplug (kernel) can do it.

> > Why?
> > ====
> > Someone might think like "Why don't we invoke CPU, memory, IO
> > hotplug individually without node hotplug?". However, CPU,
> > memory, and IO hotplug cannot remove a node (container device)
> > from the system physically. That's a node hotplug's job. Also,
>
> You can remove them physically as soon as all the hardware
> in it is removed. There can be many more things than you listed
> anyways on such a "node"; lprocesses which are bound to the CPU
> of the node (which need to be killed) or device drivers bound
> to slots on the node (which need to be shutdown). I do not think it
> makes much sense to attempt to combine all these at kernel level;
> that is more a job for a shell script. The bad experience with
> the current sysfs power management hierarchy shows that the kernel
> is a poor place to attempt this.

As you mentioned, there can be many more things. However,
the examples you showed should be handled by individual hotplug.
For instance, the processes bound to a certain CPU should be
handled by CPU hotplug when node hotplug invokes CPU hotplug.
Also, the device drivers bound to slots should be detached by
IO hotplug. Node hotplug is supposed to take care of a container
device and just invoke individual hotplugs for other devices.
Node hotplug doesn't break any policy of each hotplug (CPU,
memory, IO, etc).

Shell script could handle hot-removal of container hotplug, but
detecting an added container device needs to be done in kernel
level.

> > when hotplug request occurs on a node, node hotplug searchs
> > resouces on the node, and invokes CPU, memory, and IO hotplug
> > in proper order. Actually, the order is very important. For instance,
> > when hot-adding a NUMA node, memory should be added first
> > so that CPUs could allocate data in the memory while the CPUs is
> > being added. Otherwise, CPUs need to allocate it in other memory
> > on other node. This might cause performance issue.
>
> This order can be also easily done in user space.

Yes.

> This has the advantage that if there is some reason to
> add CPUs without memory (or memory without CPUs or PCI slots
> without anything) it will just work too. It is not clear
> that your "lump everything mechanism into one" can handle all
> that. Most likely you would need to add lots of special
> cases to it to handle all this. Separate mechanisms can
> do this cleaner.

Yes. The cases (combination of hardware) would increase.
However, I don't think there would be that many today.

> Admittedly there would still need to be some coordination in case you
> would want to remove a whole building block of your machine like
> you said. An nice way to do this would be to add an atomic "to be removed"
> state to the individual unregister mechanisms that prevents the
> device from being reregistered until removed.

I agree. Actually, I already prepare the "to be removed" state in
node hotplug, it's not used yet though:) It's also necessary for
individual hotplug as you mentioned.

> > Design
> > ======
> > ACPI is used to do some hardware manipulation.
>
> That will not work as a portable mechanism. Most linux ports
> including some that support NUMA and hotplug do not have ACPI.
>
> Rather you could do an layer to do this which is implemented
> in the architecture. Then all ACPi supporting architectures could
> use a common implementation and the other architectures their own.
> The design of this layer would need some discussion first to
> make sure it does not have too many assumptions from your
> system that would not hold on others.

Are you saying here that platform-independent and dependent
codes should be separated so that the independent part could
be implemented by any other platform (firmware) in their own
style? If so, I agree and some discussion are necessary :)

Thanks,
Kei
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/