Re: [PATCH v2 1/1] cpumask: Don't waste memory for sysfs cpulist nodes

From: Phil Auld
Date: Fri Sep 23 2022 - 07:36:46 EST


On Thu, Sep 22, 2022 at 06:45:27PM -0700 Yury Norov wrote:
> On Thu, Sep 22, 2022 at 5:38 PM Phil Auld <pauld@xxxxxxxxxx> wrote:
> >
> > Hi Andy,
> >
> > On Thu, Sep 22, 2022 at 10:49:54PM +0300 Andy Shevchenko wrote:
> > > Currently the approximation is used which wastes the more memory
> > > the more CPUs are present on the system. Proposed change calculates
> > > the exact maximum needed in the worst case:
> > >
> > > NR_CPUS old new
> > > ------- --- ---
> > > 1 .. 1170 4096 4096
> > > 1171 .. 1860 4098 .. 6510 4096
> > > ... ... ...
> > > 2*4096 28672 19925
> > > 4*4096 57344 43597
> > > 8*4096 114688 92749
> > > 16*4096 229376 191053
> > > 32*4096 458752 403197
> > > 64*4096 917504 861949
> > > 128*4096 1835008 1779453
> > > 256*4096 3670016 3670016
> > >
> > > Under the hood the reccurent formula is being used:
> > > (5 - 0) * 2 +
> > > (50 - 5) * 3 +
> > > (500 - 50) * 4 +
> > > (5000 - 500) * 5 +
> > > ...
> > > (X[i] - X[i-1]) * i
> > >
> > > which allows to count the exact maximum length in the worst case,
> > > i.e. when each second CPU is being listed. For backward compatibility
> > > for more than 1170 and less than 1861 CPUs the page size is preserved.
> > >
> > > For less than 1171 and more than 1 million CPUs the old is being used.
> >
> > The memory is not really wasted since it's probably temporary in userspace
> > and in the kernel it _is_ temporary and is only the length of the kasprintf
> > string, which is most of the time much less.
> >
> > But that said, it is more accurate than the previous estimate.
> >
> > I was wondering if you were going to try to come up with a suitable
> > compile time macro :)
> >
> > I tested 2, 8192 and 16k since the kernel does not want to build for other
> > reasons with NR_CPUS at 32k.
> >
> > Reviewed-by: Phil Auld <pauld@xxxxxxxxxx>
> > Tested-by: Phil Auld <pauld@xxxxxxxxxx>
>
> By the way, why don't we use nr_cpu_ids? It's set at boot-time, or even
> at compile-time in some cases and never changed at runtime.
>
> nr_cpu_ids is set very early, when ACPI tables are parsed. I don't think
> it's possible for a userspace to observe it uninitialized. Am I wrong?
>

I very much wanted it to be compile time so there was no overhead.

Is there a good place to init these bin_attrs at boottime, anyway?
I didn't dig into it farther, but it looks like the size is set
when declared. I suppose we could adjust it at the first read if
there was a way to do that. It's sort of all abstracted away to
common code though.

The fact that the values are larger than actually needed doesn't really
cause any harm. In the kernel no extra memory is allocated since it uses
the size of the sprintf of the actual data. And it's only allocated
while being read.

In userspace it may be a little extra memory while doing a read. But that's
also likely to be just temporary.

And if you have NR_CPUS 10000 you probably have a lot of wasted percpu
space anyway...


Cheers,
Phil

--