RE: [PATCH v2] cpu-topology: Skip the exist but not possible cpu nodes

From: Zengtao (B)
Date: Mon Jan 13 2020 - 20:42:36 EST


> -----Original Message-----
> From: Sudeep Holla [mailto:sudeep.holla@xxxxxxx]
> Sent: Monday, January 13, 2020 8:21 PM
> To: Zengtao (B)
> Cc: Linuxarm; Greg Kroah-Hartman; Rafael J. Wysocki; Sudeep Holla;
> linux-kernel@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH v2] cpu-topology: Skip the exist but not possible cpu
> nodes
>
> On Mon, Jan 13, 2020 at 12:06:11PM +0000, Zengtao (B) wrote:
> > > -----Original Message-----
> > > From: Sudeep Holla [mailto:sudeep.holla@xxxxxxx]
> > > Sent: Monday, January 13, 2020 6:19 PM
> > > To: Zengtao (B)
> > > Cc: Linuxarm; Greg Kroah-Hartman; Rafael J. Wysocki; Sudeep Holla;
> > > linux-kernel@xxxxxxxxxxxxxxx
> > > Subject: Re: [PATCH v2] cpu-topology: Skip the exist but not possible
> cpu
> > > nodes
> > >
> > > On Sat, Jan 11, 2020 at 02:53:40PM +0800, Zeng Tao wrote:
> > > > When CONFIG_NR_CPUS is smaller than the cpu nodes defined in
> the
> > > device
> > > > tree, all the cpu nodes parsing will fail.
> > > > And this is not reasonable for a legal device tree configs.
> > > > In this patch, skip such cpu nodes rather than return an error.
> > > > With CONFIG_NR_CPUS = 128 and cpus nodes num in device tree
> is
> > > 130,
> > > > The following warning messages will be print during boot:
> > > > CPU node for /cpus/cpu@128 exist but the possible cpu range
> > > is :0-127
> > > > CPU node for /cpus/cpu@129 exist but the possible cpu range
> > > is :0-127
> > > > CPU node for /cpus/cpu@130 exist but the possible cpu range
> > > is :0-127
> > > >
> > > > Signed-off-by: Zeng Tao <prime.zeng@xxxxxxxxxxxxx>
> > > > ---
> > > > Changelog:
> > > > v1->v2:
> > > > -Remove redundant -ENODEV assignment in get_cpu_for_node
> > > > -Add comment to describe the get_cpu_for_node return values
> > > > -Add skip process for cpu threads
> > > > -Update the commit log with more detail
> > > > ---
> > > > drivers/base/arch_topology.c | 37
> > > +++++++++++++++++++++++++++++--------
> > > > 1 file changed, 29 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/drivers/base/arch_topology.c
> > > b/drivers/base/arch_topology.c
> > > > index 5fe44b3..01f0e21 100644
> > > > --- a/drivers/base/arch_topology.c
> > > > +++ b/drivers/base/arch_topology.c
> > > > @@ -248,22 +248,44 @@ core_initcall(free_raw_capacity);
> > > > #endif
> > > >
> > > > #if defined(CONFIG_ARM64) || defined(CONFIG_RISCV)
> > > > +/*
> > > > + * This function returns the logic cpu number of the node.
> > > > + * There are totally three kinds of return values:
> > > > + * (1) logic cpu number which is > 0.
> > > > + * (2) -ENDEV when the node is valid one which can be found in
> the
> > > device tree
> > > > + * but there is no possible cpu nodes to match, when the
> > > CONFIG_NR_CPUS is
> > > > + * smaller than cpus node numbers in device tree, this will
> happen.
> > > It's
> > > > + * suggested to just ignore this case.
> > >
> > > s/ENDEV/ENODEV/
> > Good catch, thanks.
> >
> > >
> > > Also as I mentioned earlier, I prefer not to add any extra logic here
> > > other than the above comment to make it explicit. This triggers
> > > unnecessary
> > > warnings when someone boots with limited CPUs for valid reasons.
> > >
> >
> > So , what 's your suggestion here? Just keep the comments but remove
> > the warning message print?
>
> Yes for all the "found" logic. I am fine to update the existing err
>

Find, I will take it.
.
> > >
> > > > + * (3) -EINVAL when other errors occur.
> > > > + */
> > > > static int __init get_cpu_for_node(struct device_node *node)
> > > > {
> > > > - struct device_node *cpu_node;
> > > > + struct device_node *cpu_node, *t;
> > > > int cpu;
> > > > + bool found = false;
> > > >
> > > > cpu_node = of_parse_phandle(node, "cpu", 0);
> > > > if (!cpu_node)
> > > > - return -1;
> > > > + return -EINVAL;
> > > > +
> > > > + for_each_of_cpu_node(t)
> > > > + if (t == cpu_node) {
> > > > + found = true;
> > > > + break;
> > > > + }
> > > > +
> > > > + if (!found) {
> > > > + pr_crit("Unable to find CPU node for %pOF\n",
> cpu_node);
> > > > + return -EINVAL;
> > > > + }
>
> Drop all the above change.

Could you help to explain here?
I understand there are two abnormal cases:
1. The cpu node exist in the device tree, but not a possible cpu.
This case can be caught by of_cpu_node_to_id's return value.
2. The cpu node does not exist.
This case can be caught by above logic. Or do you think
of_parse_phandle's return value is enough?

>
> > > >
> > > > cpu = of_cpu_node_to_id(cpu_node);
> > > > if (cpu >= 0)
> > > > topology_parse_cpu_capacity(cpu_node, cpu);
>
> You can add here: else if (cpu == -ENODEV)
> pr_info(...whatever you have below..)
>
> Other things as is. Warning may be too harsh if one is running with
> reduced number of CPUs.
>
> > > > else
> > > > - pr_crit("Unable to find CPU node for %pOF\n",
> cpu_node);
> > > > + pr_warn("CPU node for %pOF exist but the possible cpu
> range
> > > is :%*pbl\n",
> > > > + cpu_node, cpumask_pr_args(cpu_possible_mask));
> > > >
> > > > - of_node_put(cpu_node);
> > >
> > > Why is this dropped ?
> >
> > It's unnecessary here since no one get the node ref.
> >
>
> Please read the description of of_parse_phandle. If you find other
> issues with existing code, address it in separate patch and don't mix
> with the issue in $subject.
>
^_^, got it , will remove, Thanks

Regards
Zengtao