Re: [PATCH: 002/017]Memory hotplug for new nodes v.4.(change nameold add_memory() to arch_add_memory())

From: KAMEZAWA Hiroyuki
Date: Tue Mar 21 2006 - 19:02:59 EST


On Tue, 21 Mar 2006 10:00:12 -0800
Dave Hansen <haveblue@xxxxxxxxxx> wrote:

> On Sat, 2006-03-18 at 10:26 +0900, KAMEZAWA Hiroyuki wrote:
> > If *determine node* function is moved to arch specific parts,
> > memory hot add need more and more codes to determine paddr -> nid in arch
> > specific codes. Then, we have to add new paddr->nid function even if new nid is
> > passed by firmware. We *lose* useful information of nid from firmware if
> > add_memory() has just 2 args, (start, end).
>
> What I'm saying is that I'd like add_memory() to be just that, for
> adding memory.
>
> At some point in the process, you need to export the NUMA node layout to
> the rest of the system, to say which pages go in which node. I'm just
> saying that you should do that _before_ add_memory().
>

To do so, we have to maintain new pfn_to_nid() function.
We have to maintain a new table/list and have to consider name of it :).
And, add_memory() has to check whether a node which belongs exists ot not, again.
I don't want these kind of things.

With current kernel, we have to add new *pgdat* to node when adding a new node.
(If we don't, the kernel goes panic()) And we have to allocate a pgdat/zones
in a local node in future. So adding a node before adding memory is not good.
(current code uses kmalloc() just for reducing complexity.)

> add_memory() should support adding memory to more than one node. If any
> hypervisor or hardware happens to have memory added in one contiguous
> chunk, it can not simply call add_memory(). _That_ firmware would be
> forced to do the NUMA parsing and figure out how many times to call
> add_memory().
I don't think the firmware adds memory of multiple nodes at once.
It's crazy.

>
> Let me reiterate: the process of telling the system which pages are in
> which node should be separate from telling the system that there *are*
> currently pages there now.

Considering "cpu only node', "check and add new node" function can be separated,
like add_memory_less_node().(But pgdat/zone etc.. will be allocated in out of node.)

Bye.
-- Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/