RE: [PATCH v5 1/3] cpumask: introduce cpumap_print_to_buf to support large bitmask and list

From: Song Bao Hua (Barry Song)
Date: Sat Jul 03 2021 - 04:31:24 EST




> -----Original Message-----
> From: Yury Norov [mailto:yury.norov@xxxxxxxxx]
> Sent: Saturday, July 3, 2021 9:31 AM
> To: Song Bao Hua (Barry Song) <song.bao.hua@xxxxxxxxxxxxx>
> Cc: gregkh@xxxxxxxxxxxxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx;
> andriy.shevchenko@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> dave.hansen@xxxxxxxxx; linux@xxxxxxxxxxxxxxxxxx; rafael@xxxxxxxxxx;
> rdunlap@xxxxxxxxxxxxx; agordeev@xxxxxxxxxxxxx; sbrivio@xxxxxxxxxx;
> jianpeng.ma@xxxxxxxxx; valentin.schneider@xxxxxxx; peterz@xxxxxxxxxxxxx;
> bristot@xxxxxxxxxx; guodong.xu@xxxxxxxxxx; tangchengchang
> <tangchengchang@xxxxxxxxxx>; Zengtao (B) <prime.zeng@xxxxxxxxxxxxx>;
> yangyicong <yangyicong@xxxxxxxxxx>; tim.c.chen@xxxxxxxxxxxxxxx; Linuxarm
> <linuxarm@xxxxxxxxxx>; tiantao (H) <tiantao6@xxxxxxxxxxxxx>
> Subject: Re: [PATCH v5 1/3] cpumask: introduce cpumap_print_to_buf to support
> large bitmask and list
>
> On Fri, Jul 02, 2021 at 09:25:57PM +1200, Barry Song wrote:
> > From: Tian Tao <tiantao6@xxxxxxxxxxxxx>
> >
> > The existing cpumap_print_to_pagebuf() is used by cpu topology and other
> > drivers to export hexadecimal bitmask and decimal list to userspace by
> > sysfs ABI.
> >
> > Right now, those drivers are using a normal attribute for this kind of
> > ABIs. A normal attribute typically has show entry as below:
> >
> > static ssize_t example_dev_show(struct device *dev,
> > struct device_attribute *attr, char *buf)
> > {
> > ...
> > return cpumap_print_to_pagebuf(true, buf, &pmu_mmdc->cpu);
> > }
> > show entry of attribute has no offset and count parameters and this
> > means the file is limited to one page only.
> >
> > cpumap_print_to_pagebuf() API works terribly well for this kind of
> > normal attribute with buf parameter and without offset, count:
> >
> > static inline ssize_t
> > cpumap_print_to_pagebuf(bool list, char *buf, const struct cpumask *mask)
> > {
> > return bitmap_print_to_pagebuf(list, buf, cpumask_bits(mask),
> > nr_cpu_ids);
> > }
> >
> > The problem is once we have many cpus, we have a chance to make bitmask
> > or list more than one page. Especially for list, it could be as complex
> > as 0,3,5,7,9,...... We have no simple way to know it exact size.
> >
> > It turns out bin_attribute is a way to break this limit. bin_attribute
> > has show entry as below:
> > static ssize_t
> > example_bin_attribute_show(struct file *filp, struct kobject *kobj,
> > struct bin_attribute *attr, char *buf,
> > loff_t offset, size_t count)
> > {
> > ...
> > }
> >
> > With the new offset and count parameters, this makes sysfs ABI be able
> > to support file size more than one page. For example, offset could be
> > >= 4096.
> >
> > This patch introduces cpumap_print_to_buf() so that those drivers can
> > move to bin_attribute to support large bitmask and list. In result,
> > we have to pass the corresponding parameters from bin_attribute to this
> > new API.
> >
> > Signed-off-by: Tian Tao <tiantao6@xxxxxxxxxxxxx>
> > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> > Cc: Andy Shevchenko <andriy.shevchenko@xxxxxxxxxxxxxxx>
> > Cc: Randy Dunlap <rdunlap@xxxxxxxxxxxxx>
> > Cc: Stefano Brivio <sbrivio@xxxxxxxxxx>
> > Cc: Alexander Gordeev <agordeev@xxxxxxxxxxxxx>
> > Cc: "Ma, Jianpeng" <jianpeng.ma@xxxxxxxxx>
> > Cc: Yury Norov <yury.norov@xxxxxxxxx>
> > Cc: Valentin Schneider <valentin.schneider@xxxxxxx>
> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > Cc: Daniel Bristot de Oliveira <bristot@xxxxxxxxxx>
> > Signed-off-by: Barry Song <song.bao.hua@xxxxxxxxxxxxx>
> > ---
> > include/linux/cpumask.h | 19 +++++++++++++++++++
> > lib/cpumask.c | 18 ++++++++++++++++++
> > 2 files changed, 37 insertions(+)
> >
> > diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
> > index bfc4690de4f4..24f410a2e793 100644
> > --- a/include/linux/cpumask.h
> > +++ b/include/linux/cpumask.h
> > @@ -983,6 +983,25 @@ cpumap_print_to_pagebuf(bool list, char *buf, const
> struct cpumask *mask)
> > nr_cpu_ids);
> > }
> >
> > +/**
> > + * cpumap_print_to_buf - copies the cpumask into the buffer either
> > + * as comma-separated list of cpus or hex values of cpumask;
> > + * Typically used by bin_attribute to export cpumask bitmask and
> > + * list ABI.
> > + * @list: indicates whether the cpumap must be list
> > + * true: print in decimal list format
> > + * fasle: print in hexadecimal bitmask format
> > + * @mask: the cpumask to copy
> > + * @buf: the buffer to copy into
> > + * @off: in the string from which we are copying, We copy to @buf
> > + * @count: the maximum number of bytes to print
> > + *
> > + * Returns the length of how many bytes have been copied.
> > + */
> > +extern ssize_t
> > +cpumap_print_to_buf(bool list, char *buf, const struct cpumask *mask,
> > + loff_t off, size_t count);
> > +
> > #if NR_CPUS <= BITS_PER_LONG
> > #define CPU_MASK_ALL \
> > (cpumask_t) { { \
> > diff --git a/lib/cpumask.c b/lib/cpumask.c
> > index c3c76b833384..40421a6d31bc 100644
> > --- a/lib/cpumask.c
> > +++ b/lib/cpumask.c
> > @@ -279,3 +279,21 @@ int cpumask_any_distribute(const struct cpumask *srcp)
> > return next;
> > }
> > EXPORT_SYMBOL(cpumask_any_distribute);
> > +
> > +ssize_t cpumap_print_to_buf(bool list, char *buf, const struct cpumask *mask,
> > + loff_t off, size_t count)
> > +{
> > + const char *fmt = list ? "%*pbl\n" : "%*pb\n";
> > + ssize_t size;
> > + void *data;
> > +
> > + data = kasprintf(GFP_KERNEL, fmt, nr_cpu_ids, cpumask_bits(mask));
> > + if (!data)
> > + return -ENOMEM;
> > +
> > + size = memory_read_from_buffer(buf, count, &off, data, strlen(data) + 1);
> > + kfree(data);
>
> Barry,
>
> It looks like my comments for previous iteration were ignored. I don't
> like the approach where you allocate potentially big amount of kernel
> memory just to free it almost immediately. Nor in lib/bitmap, neither
> in lib/cpumask.
>

Yury, clearly your comment was not ignored. I explained in this reply:
https://lore.kernel.org/lkml/bd62f55457ef4c269db5bb752b7accc0@xxxxxxxxxxxxx/

I explained in that email and I want to make it more clear:

I don't think moving memory allocation to drivers is a correct way
as for its main users - bin attribute, we have no way to reuse the
buffer allocated in drivers.

> For next iterations, please move this function back to lib/bitmap
> because there's no specific here for cpumasks.

I am ok with taking the bitmap API back as actually it is what i
really preferred. Just to easy your worry on somebody else will
abuse bitmap API. So I narrowed the scope of the modification.


>
> Thaks,
> Yury
>
> > + return size;
> > +}
> > +EXPORT_SYMBOL(cpumap_print_to_buf);

Thanks
Barry