Re: [PATCH 1/2] powerpc/nodes: Ensure enough nodes avail for operations

From: Michael Ellerman
Date: Mon Oct 16 2017 - 08:33:08 EST


Michael Bringmann <mwb@xxxxxxxxxxxxxxxxxx> writes:

> powerpc/nodes: On systems like PowerPC which allow 'hot-add' of CPU

This is a powerpc-only patch, so saying "systems like PowerPC" is
confusing. What you should be saying is "On pseries systems".

> or memory resources, it may occur that the new resources are to be
> inserted into nodes that were not used for these resources at bootup.
> In the kernel, any node that is used must be defined and initialized
> at boot.
>
> This patch extracts the value of the lowest domain level (number of
> allocable resources) from the "rtas" device tree property
> "ibm,current-associativity-domains" or the device tree property

What is current associativity domains? I've not heard of it, where is it
documented, and what does it mean.

Why would use the "current" set vs the "max"? I thought the whole point
was to discover the maximum possible set of nodes that could be
hotplugged.

> "ibm,max-associativity-domains" to use as the maximum number of nodes
> to setup as possibly available in the system. This new setting will
> override the instruction,
>
> nodes_and(node_possible_map, node_possible_map, node_online_map);
>
> presently seen in the function arch/powerpc/mm/numa.c:initmem_init().
>
> If the property is not present at boot, no operation will be performed
> to define or enable additional nodes.
>
> Signed-off-by: Michael Bringmann <mwb@xxxxxxxxxxxxxxxxxx>
> ---
> arch/powerpc/mm/numa.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 47 insertions(+)
>
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index ec098b3..b385cd0 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -892,6 +892,51 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
> NODE_DATA(nid)->node_spanned_pages = spanned_pages;
> }
>
> +static void __init node_associativity_setup(void)

This should really be called "find_possible_nodes()" or something more
descriptive.

> +{
> + struct device_node *rtas;
> +
> + rtas = of_find_node_by_path("/rtas");
> + if (rtas) {

If you just short-circuit that return the whole function body can be
deintented, making it significantly more readable.

ie:
+ rtas = of_find_node_by_path("/rtas");
+ if (!rtas)
+ return;

> + const __be32 *prop;
> + u32 len, entries, numnodes, i;
> +
> + prop = of_get_property(rtas,
> + "ibm,current-associativity-domains", &len);

Please don't use of_get_property() in new code, we have much better
accessors these days, which do better error checking and handle the
endian conversions for you.

In this case you'd use eg:

u32 entries;
rc = of_property_read_u32(rtas, "ibm,current-associativity-domains", &entries);

> + if (!prop || len < sizeof(unsigned int)) {
> + prop = of_get_property(rtas,
> + "ibm,max-associativity-domains", &len);
> + goto endit;
> + }
> +
> + entries = of_read_number(prop++, 1);
> +
> + if (len < (entries * sizeof(unsigned int)))
> + goto endit;
> +
> + if ((0 <= min_common_depth) && (min_common_depth <= (entries-1)))
> + entries = min_common_depth;
> + else
> + entries -= 1;
^
You can't just guess that will be the right entry.

If min_common_depth is < 0 the function should have just returned
immediately at the top.

If min_common_depth is outside the range of the property that's a buggy
device tree, you should print a warning and return.

> + numnodes = of_read_number(&prop[entries], 1);

u32 num_nodes;
rc = of_property_read_u32_index(rtas, "ibm,current-associativity-domains", min_common_depth, &num_nodes);
> +
> + printk(KERN_INFO "numa: Nodes = %d (mcd = %d)\n", numnodes,
> + min_common_depth);
> +
> + for (i = 0; i < numnodes; i++) {
> + if (!node_possible(i)) {
> + setup_node_data(i, 0, 0);

Do we actually need to setup the NODE_DATA() yet? Doing it now ensures
it will not be allocated node local, which sucks.

> + node_set(i, node_possible_map);
> + }
> + }
> + }
> +
> +endit:

"out" would be the normal name.

> + if (rtas)
> + of_node_put(rtas);
> +}
> +
> void __init initmem_init(void)
> {
> int nid, cpu;
> @@ -911,6 +956,8 @@ void __init initmem_init(void)
> */

You need to update the comment above here which is contradicted by the
new function you're adding.

> nodes_and(node_possible_map, node_possible_map, node_online_map);
>
> + node_associativity_setup();
> +
> for_each_online_node(nid) {
> unsigned long start_pfn, end_pfn;
>

cheers