RE: [PATCH v4 0/4] use bin_attribute to avoid cpumap buff overflow

From: Song Bao Hua (Barry Song)
Date: Thu Jun 17 2021 - 19:48:59 EST




> -----Original Message-----
> From: Yury Norov [mailto:yury.norov@xxxxxxxxx]
> Sent: Friday, June 18, 2021 9:18 AM
> To: Song Bao Hua (Barry Song) <song.bao.hua@xxxxxxxxxxxxx>
> Cc: gregkh@xxxxxxxxxxxxxxxxxxx; andriy.shevchenko@xxxxxxxxxxxxxxx;
> linux-kernel@xxxxxxxxxxxxxxx; linux@xxxxxxxxxxxxxxxxxx; rafael@xxxxxxxxxx;
> akpm@xxxxxxxxxxxxxxxxxxxx; rdunlap@xxxxxxxxxxxxx; agordeev@xxxxxxxxxxxxx;
> sbrivio@xxxxxxxxxx; jianpeng.ma@xxxxxxxxx; valentin.schneider@xxxxxxx;
> peterz@xxxxxxxxxxxxx; bristot@xxxxxxxxxx; guodong.xu@xxxxxxxxxx;
> tangchengchang <tangchengchang@xxxxxxxxxx>; Zengtao (B)
> <prime.zeng@xxxxxxxxxxxxx>; yangyicong <yangyicong@xxxxxxxxxx>;
> tim.c.chen@xxxxxxxxxxxxxxx; tiantao (H) <tiantao6@xxxxxxxxxxxxx>; Jonathan
> Cameron <jonathan.cameron@xxxxxxxxxx>; Linuxarm <linuxarm@xxxxxxxxxx>
> Subject: Re: [PATCH v4 0/4] use bin_attribute to avoid cpumap buff overflow
>
> On Thu, Jun 17, 2021 at 10:19:06PM +1200, Barry Song wrote:
> > patch #1 adds a new function cpumap_print_to_buf and patch #2 uses
> > this function in drivers/base/topology.c, and patch #3 uses this new
> > function in drivers/base/node.c.
> > patch #4 adds test cases for the new API.
>
> So, the root problem here is that some machines have so many CPUs that
> their cpumask text representation may not fit into the full page in the
> worst case. Is my understanding correct? Can you share an example of
> such configuration?

in my understanding, I have not found any machine which has really
caused the problem till now. But the whole story began from this
thread when Jonatah and me tried to add a new topology level-cluster
which exists on kunpeng920 and X86 Jacobsville:
https://lore.kernel.org/lkml/YFRGIedW1fUlnmi+@xxxxxxxxx/
https://lore.kernel.org/lkml/YFR2kwakbcGiI37w@xxxxxxxxx/

in the discussion, Greg had some concern about the potential
overflow of sysfs ABI for topology. Greg's comment is reasonable
and I think we should address the problem.

For this moment, both numa node and cpu use cpu bitmap like 3,ffffffff to
expose hardware topology. When cpu number is large, the page buffer of
sysfs will overflow. This doesn't really happen nowadays as the maximum
NR_CPUS is 8196 for X86_64 and 4096 for ARM64 since 8196 * 9 / 32 = 2305
is still smaller than 4KB page size.

So the existing BUILD_BUG_ON() in drivers/base/node.c is pretty much
preventing future problems similar with Y2K when hardware gets more
and more CPUs:
static ssize_t node_read_cpumap(struct device *dev, bool list, char *buf)
{
cpumask_var_t mask;
struct node *node_dev = to_node(dev);

/* 2008/04/07: buf currently PAGE_SIZE, need 9 chars per 32 bits. */
BUILD_BUG_ON((NR_CPUS/32 * 9) > (PAGE_SIZE-1));
}

But other ABIs exposing cpu lists are much more tricky, they could be
like: 0, 3, 5, 7, 9, 11... etc. so nobody knows the size till the last
moment.

So in the previous discussion, we agreed to remove this kind of
limitation totally and remove the BUILD_BUG_ON().

>
> I think that the proper solution would be to create a function like
> void *cpumap_alloc_pagebuf(bool list), so that cpumap_print_to_pagebuf()
> will be always passed with enough portion of memory, and we'll just drop
> the PAGE_SIZE limit in cpumap_print_to_pagebuf().

I am sorry we missed you in the previous discussion:
https://lore.kernel.org/lkml/1619679819-45256-1-git-send-email-tiantao6@xxxxxxxxxxxxx/#t

I think we have figured out bin_attribute is the way to workaround
the limitation of sysfs:

https://lore.kernel.org/lkml/fd78ac30-dd3b-a7d7-eae8-193b09a7d49a@xxxxxxxxx/
https://lore.kernel.org/lkml/YIueOR4fOYa1dSAb@xxxxxxxxx/

Again sorry for you were missed in the previous discussion.

>
> >
> > v4:
> > add test cases for bitmap_print_to_buf API;
> > add Reviewed-by of Jonathan Cameron for patches 1-3, thanks!
> >
> > v3:
> > fixed the strlen issue and patch #1,#2,#3 minor formatting issues, thanks
> > to Andy Shevchenko and Jonathan Cameron.
> >
> > v2:
> > split the original patch #1 into two patches and use kasprintf() in
> > patch #1 to simplify the code. do some minor formatting adjustments.
> >
> > Barry Song (1):
> > lib: test_bitmap: add bitmap_print_to_buf test cases
> >
> > Tian Tao (3):
> > lib: bitmap: introduce bitmap_print_to_buf
> > topology: use bin_attribute to avoid buff overflow
> > drivers/base/node.c: use bin_attribute to avoid buff overflow
> >
> > drivers/base/node.c | 52 +++++++++-----
> > drivers/base/topology.c | 115 +++++++++++++++++--------------
> > include/linux/bitmap.h | 2 +
> > include/linux/cpumask.h | 21 ++++++
> > lib/bitmap.c | 37 +++++++++-
> > lib/test_bitmap.c | 149 ++++++++++++++++++++++++++++++++++++++++
> > 6 files changed, 304 insertions(+), 72 deletions(-)
> >
> > --
> > 2.25.1

Thanks
Barry