RE: [RFC PATCH v5 1/4] topology: Represent clusters of CPUs within a die

From: Song Bao Hua (Barry Song)
Date: Wed Apr 21 2021 - 00:06:14 EST




> -----Original Message-----
> From: Song Bao Hua (Barry Song)
> Sent: Tuesday, April 20, 2021 3:24 PM
> To: 'Greg KH' <gregkh@xxxxxxxxxxxxxxxxxxx>; Jonathan Cameron
> <jonathan.cameron@xxxxxxxxxx>
> Cc: tim.c.chen@xxxxxxxxxxxxxxx; catalin.marinas@xxxxxxx; will@xxxxxxxxxx;
> rjw@xxxxxxxxxxxxx; vincent.guittot@xxxxxxxxxx; bp@xxxxxxxxx;
> tglx@xxxxxxxxxxxxx; mingo@xxxxxxxxxx; lenb@xxxxxxxxxx; peterz@xxxxxxxxxxxxx;
> dietmar.eggemann@xxxxxxx; rostedt@xxxxxxxxxxx; bsegall@xxxxxxxxxx;
> mgorman@xxxxxxx; msys.mizuma@xxxxxxxxx; valentin.schneider@xxxxxxx;
> juri.lelli@xxxxxxxxxx; mark.rutland@xxxxxxx; sudeep.holla@xxxxxxx;
> aubrey.li@xxxxxxxxxxxxxxx; linux-arm-kernel@xxxxxxxxxxxxxxxxxxx;
> linux-kernel@xxxxxxxxxxxxxxx; linux-acpi@xxxxxxxxxxxxxxx; x86@xxxxxxxxxx;
> xuwei (O) <xuwei5@xxxxxxxxxx>; Zengtao (B) <prime.zeng@xxxxxxxxxxxxx>;
> guodong.xu@xxxxxxxxxx; yangyicong <yangyicong@xxxxxxxxxx>; Liguozhu (Kenneth)
> <liguozhu@xxxxxxxxxxxxx>; linuxarm@xxxxxxxxxxxxx; hpa@xxxxxxxxx; tiantao (H)
> <tiantao6@xxxxxxxxxxxxx>
> Subject: RE: [RFC PATCH v5 1/4] topology: Represent clusters of CPUs within
> a die
>
>
>
> > -----Original Message-----
> > From: Greg KH [mailto:gregkh@xxxxxxxxxxxxxxxxxxx]
> > Sent: Friday, March 19, 2021 11:02 PM
> > To: Jonathan Cameron <jonathan.cameron@xxxxxxxxxx>
> > Cc: Song Bao Hua (Barry Song) <song.bao.hua@xxxxxxxxxxxxx>;
> > tim.c.chen@xxxxxxxxxxxxxxx; catalin.marinas@xxxxxxx; will@xxxxxxxxxx;
> > rjw@xxxxxxxxxxxxx; vincent.guittot@xxxxxxxxxx; bp@xxxxxxxxx;
> > tglx@xxxxxxxxxxxxx; mingo@xxxxxxxxxx; lenb@xxxxxxxxxx;
> peterz@xxxxxxxxxxxxx;
> > dietmar.eggemann@xxxxxxx; rostedt@xxxxxxxxxxx; bsegall@xxxxxxxxxx;
> > mgorman@xxxxxxx; msys.mizuma@xxxxxxxxx; valentin.schneider@xxxxxxx;
> > juri.lelli@xxxxxxxxxx; mark.rutland@xxxxxxx; sudeep.holla@xxxxxxx;
> > aubrey.li@xxxxxxxxxxxxxxx; linux-arm-kernel@xxxxxxxxxxxxxxxxxxx;
> > linux-kernel@xxxxxxxxxxxxxxx; linux-acpi@xxxxxxxxxxxxxxx; x86@xxxxxxxxxx;
> > xuwei (O) <xuwei5@xxxxxxxxxx>; Zengtao (B) <prime.zeng@xxxxxxxxxxxxx>;
> > guodong.xu@xxxxxxxxxx; yangyicong <yangyicong@xxxxxxxxxx>; Liguozhu
> (Kenneth)
> > <liguozhu@xxxxxxxxxxxxx>; linuxarm@xxxxxxxxxxxxx; hpa@xxxxxxxxx
> > Subject: Re: [RFC PATCH v5 1/4] topology: Represent clusters of CPUs within
> > a die
> >
> > On Fri, Mar 19, 2021 at 09:36:16AM +0000, Jonathan Cameron wrote:
> > > On Fri, 19 Mar 2021 06:57:08 +0000
> > > "Song Bao Hua (Barry Song)" <song.bao.hua@xxxxxxxxxxxxx> wrote:
> > >
> > > > > -----Original Message-----
> > > > > From: Greg KH [mailto:gregkh@xxxxxxxxxxxxxxxxxxx]
> > > > > Sent: Friday, March 19, 2021 7:35 PM
> > > > > To: Song Bao Hua (Barry Song) <song.bao.hua@xxxxxxxxxxxxx>
> > > > > Cc: tim.c.chen@xxxxxxxxxxxxxxx; catalin.marinas@xxxxxxx;
> > will@xxxxxxxxxx;
> > > > > rjw@xxxxxxxxxxxxx; vincent.guittot@xxxxxxxxxx; bp@xxxxxxxxx;
> > > > > tglx@xxxxxxxxxxxxx; mingo@xxxxxxxxxx; lenb@xxxxxxxxxx;
> > peterz@xxxxxxxxxxxxx;
> > > > > dietmar.eggemann@xxxxxxx; rostedt@xxxxxxxxxxx; bsegall@xxxxxxxxxx;
> > > > > mgorman@xxxxxxx; msys.mizuma@xxxxxxxxx; valentin.schneider@xxxxxxx;
> > Jonathan
> > > > > Cameron <jonathan.cameron@xxxxxxxxxx>; juri.lelli@xxxxxxxxxx;
> > > > > mark.rutland@xxxxxxx; sudeep.holla@xxxxxxx; aubrey.li@xxxxxxxxxxxxxxx;
> > > > > linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> > > > > linux-acpi@xxxxxxxxxxxxxxx; x86@xxxxxxxxxx; xuwei (O)
> > <xuwei5@xxxxxxxxxx>;
> > > > > Zengtao (B) <prime.zeng@xxxxxxxxxxxxx>; guodong.xu@xxxxxxxxxx;
> > yangyicong
> > > > > <yangyicong@xxxxxxxxxx>; Liguozhu (Kenneth) <liguozhu@xxxxxxxxxxxxx>;
> > > > > linuxarm@xxxxxxxxxxxxx; hpa@xxxxxxxxx
> > > > > Subject: Re: [RFC PATCH v5 1/4] topology: Represent clusters of CPUs
> within
> > > > > a die
> > > > >
> > > > > On Fri, Mar 19, 2021 at 05:16:15PM +1300, Barry Song wrote:
> > > > > > diff --git a/Documentation/admin-guide/cputopology.rst
> > > > > b/Documentation/admin-guide/cputopology.rst
> > > > > > index b90dafc..f9d3745 100644
> > > > > > --- a/Documentation/admin-guide/cputopology.rst
> > > > > > +++ b/Documentation/admin-guide/cputopology.rst
> > > > > > @@ -24,6 +24,12 @@ core_id:
> > > > > > identifier (rather than the kernel's). The actual value is
> > > > > > architecture and platform dependent.
> > > > > >
> > > > > > +cluster_id:
> > > > > > +
> > > > > > + the Cluster ID of cpuX. Typically it is the hardware platform's
> > > > > > + identifier (rather than the kernel's). The actual value is
> > > > > > + architecture and platform dependent.
> > > > > > +
> > > > > > book_id:
> > > > > >
> > > > > > the book ID of cpuX. Typically it is the hardware platform's
> > > > > > @@ -56,6 +62,14 @@ package_cpus_list:
> > > > > > human-readable list of CPUs sharing the same physical_package_id.
> > > > > > (deprecated name: "core_siblings_list")
> > > > > >
> > > > > > +cluster_cpus:
> > > > > > +
> > > > > > + internal kernel map of CPUs within the same cluster.
> > > > > > +
> > > > > > +cluster_cpus_list:
> > > > > > +
> > > > > > + human-readable list of CPUs within the same cluster.
> > > > > > +
> > > > > > die_cpus:
> > > > > >
> > > > > > internal kernel map of CPUs within the same die.
> > > > >
> > > > > Why are these sysfs files in this file, and not in a Documentation/ABI/
> > > > > file which can be correctly parsed and shown to userspace?
> > > >
> > > > Well. Those ABIs have been there for much a long time. It is like:
> > > >
> > > > [root@ceph1 topology]# ls
> > > > core_id core_siblings core_siblings_list physical_package_id
> > thread_siblings thread_siblings_list
> > > > [root@ceph1 topology]# pwd
> > > > /sys/devices/system/cpu/cpu100/topology
> > > > [root@ceph1 topology]# cat core_siblings_list
> > > > 64-127
> > > > [root@ceph1 topology]#
> > > >
> > > > >
> > > > > Any chance you can fix that up here as well?
> > > >
> > > > Yes. we will send a separate patch to address this, which won't
> > > > be in this patchset. This patchset will base on that one.
> > > >
> > > > >
> > > > > Also note that "list" is not something that goes in sysfs, sysfs is
> "one
> > > > > value per file", and a list is not "one value". How do you prevent
> > > > > overflowing the buffer of the sysfs file if you have a "list"?
> > > > >
> > > >
> > > > At a glance, the list is using "-" rather than a real list
> > > > [root@ceph1 topology]# cat core_siblings_list
> > > > 64-127
> > > >
> > > > Anyway, I will take a look if it has any chance to overflow.
> > >
> > > It could in theory be alternate CPUs as comma separated list.
> > > So it's would get interesting around 500-1000 cpus (guessing).
> > >
> > > Hopefully no one has that crazy a cpu numbering scheme but it's possible
> > > (note that cluster is fine for this, but I guess it might eventually
> > > happen for core-siblings list (cpus within a package).
> > >
> > > Shouldn't crash or anything like that but might terminate early.
> >
> > We have a broken sysfs api already for listing LED numbers that has had
> > to be worked around in the past, please do not create a new one with
> > that same problem, we should learn from them :)
>
> Another place I am seeing a cpu list is in numa topology:
> /sys/devices/system/node/nodex/cpulist.
>
> But the code has a BUILD_BUG_ON to guard the pagebuf:
>
> static ssize_t node_read_cpumap(struct device *dev, bool list, char *buf)
> {
> ssize_t n;
> cpumask_var_t mask;
> struct node *node_dev = to_node(dev);
>
> /* 2008/04/07: buf currently PAGE_SIZE, need 9 chars per 32 bits. */
> BUILD_BUG_ON((NR_CPUS/32 * 9) > (PAGE_SIZE-1));
>
> if (!alloc_cpumask_var(&mask, GFP_KERNEL))
> return 0;
>
> cpumask_and(mask, cpumask_of_node(node_dev->dev.id), cpu_online_mask);
> n = cpumap_print_to_pagebuf(list, buf, mask);
> free_cpumask_var(mask);
>
> return n;
> }
>
> For lists in cpu topology, I haven't seen this while I believe we need it.
> Or am I missing something?

I would prefer we send two patches as a series
"clarify and cleanup CPU and NUMA topology ABIs" with a cover
letter and the below one as 1/2. 2/2 would be the patch moving
the place of cpu topology ABI doc.