Re: NUMA aware slab allocator V3

From: Dave Hansen
Date: Mon May 16 2005 - 13:18:58 EST


On Mon, 2005-05-16 at 10:54 -0700, Christoph Lameter wrote:
> > Remember, as you saw, you can't assume that MAX_NUMNODES=1 when NUMA=n
> > because of the DISCONTIG=y case.
>
> I have never seen such a machine. A SMP machine with multiple
> "nodes"?

Yes. "discontigmem nodes"

> So essentially one NUMA node has multiple discontig "nodes"?

Yes, in theory.

A discontig node is just a contiguous area of physical memory.

> This means that the concept of a node suddenly changes if there is just
> one numa node(CONFIG_NUMA off implies one numa node)?

Correct as well.

> > So, in summary, if you want to do it right: use the
> > CONFIG_NEED_MULTIPLE_NODES that you see in -mm. As plain DISCONTIG=y
> > gets replaced by sparsemem any code using this is likely to stay
> > working.
>
> s/CONFIG_NUMA/CONFIG_NEED_MULTIPLE_NODES?
>
> That will not work because the idea is the localize the slabs to each
> node.
>
> If there are multiple nodes per numa node then invariable one node in the
> numa node (sorry for this duplication of what node means but I did not
> do it) must be preferred since numa_node_id() does not return a set of
> discontig nodes.

I know it's confusing. I feel your pain :)

You're right, I think you completely want CONFIG_NUMA, not
NEED_MULTIPLE_NODES. So, toss out that #ifdef, and everything should be
in pretty good shape. Just don't make any assumptions about how many
'struct zone' or 'pg_data_t's a single "node's" pages can come from.

Although it doesn't help your issue, you may want to read the comments
in here, I wrote it when my brain was twisting around the same issues:

http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc4/2.6.12-rc4-mm2/broken-out/introduce-new-kconfig-option-for-numa-or-discontig.patch

> Sorry but this all sounds like an flaw in the design. There is no
> consistent notion of node.

It's not really a flaw in the design, it's a misinterpretation of the
original design as new architectures implemented things. I hope to
completely ditch DISCONTIGMEM, eventually.

> Are you sure that this is not a ppc64 screwup?

Yeah, ppc64 is not at fault, it just provides the most obvious exposure
of the issue.

-- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/