RE: [PATCH 1/2] CPU, NUMA topology ABIs: clarify the overflow issue of sysfs pagebuf
From: Song Bao Hua (Barry Song)
Date: Fri Jul 23 2021 - 08:49:05 EST
> -----Original Message-----
> From: gregkh@xxxxxxxxxxxxxxxxxxx [mailto:gregkh@xxxxxxxxxxxxxxxxxxx]
> Sent: Friday, July 23, 2021 11:29 PM
> To: Song Bao Hua (Barry Song) <song.bao.hua@xxxxxxxxxxxxx>
> Cc: Dave Hansen <dave.hansen@xxxxxxxxx>; tiantao (H) <tiantao6@xxxxxxxxxxxxx>;
> corbet@xxxxxxx; linux-doc@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> Rafael J. Wysocki <rafael@xxxxxxxxxx>; Peter Zijlstra <peterz@xxxxxxxxxxxxx>;
> Valentin Schneider <valentin.schneider@xxxxxxx>; Dave Hansen
> <dave.hansen@xxxxxxxxxxxxxxx>; Daniel Bristot de Oliveira
> <bristot@xxxxxxxxxx>; Linuxarm <linuxarm@xxxxxxxxxx>
> Subject: Re: [PATCH 1/2] CPU, NUMA topology ABIs: clarify the overflow issue
> of sysfs pagebuf
>
> On Fri, Jul 23, 2021 at 11:20:19AM +0000, Song Bao Hua (Barry Song) wrote:
> >
> >
> > > -----Original Message-----
> > > From: Dave Hansen [mailto:dave.hansen@xxxxxxxxx]
> > > Sent: Friday, April 30, 2021 10:39 AM
> > > To: Song Bao Hua (Barry Song) <song.bao.hua@xxxxxxxxxxxxx>; tiantao (H)
> > > <tiantao6@xxxxxxxxxxxxx>; corbet@xxxxxxx; gregkh@xxxxxxxxxxxxxxxxxxx
> > > Cc: linux-doc@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Rafael J.
> > > Wysocki <rafael@xxxxxxxxxx>; Peter Zijlstra <peterz@xxxxxxxxxxxxx>;
> Valentin
> > > Schneider <valentin.schneider@xxxxxxx>; Dave Hansen
> > > <dave.hansen@xxxxxxxxxxxxxxx>; Daniel Bristot de Oliveira
> <bristot@xxxxxxxxxx>
> > > Subject: Re: [PATCH 1/2] CPU, NUMA topology ABIs: clarify the overflow issue
> > > of sysfs pagebuf
> > >
> > > On 4/29/21 3:32 PM, Song Bao Hua (Barry Song) wrote:
> > > > $ strace numactl --hardware 2>&1 | grep cpu
> > > > openat(AT_FDCWD, "/sys/devices/system/cpu",
> > > > O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
> > > > openat(AT_FDCWD, "/sys/devices/system/node/node0/cpumap", O_RDONLY) =
> 3
> > > > openat(AT_FDCWD, "/sys/devices/system/node/node1/cpumap", O_RDONLY) =
> 3
> > > > openat(AT_FDCWD, "/sys/devices/system/node/node2/cpumap", O_RDONLY) =
> 3
> > > > openat(AT_FDCWD, "/sys/devices/system/node/node3/cpumap", O_RDONLY) =
> 3
> > > >
> > > > If we move to binary, it means we have to change those applications.
> > >
> > > I thought Greg was saying to using a sysfs binary attribute using
> > > something like like sysfs_create_bin_file(). Those don't have the
> > > PAGE_SIZE limitation. But, there's also nothing to keep us from spewing
> > > nice human-readable text via the "binary" file.
> > >
> > > We don't need to change the file format, just the internal kernel API
> > > that we produce the files with.
> >
> > Sorry for waking-up the old thread.
> >
> > I am not sure how common a regular device_attribute will be larger than
> > 4KB and have to work around by bin_attribute. But I wrote a prototype
> > which can initially support large regular sysfs entry and be able to
> > fill the entire buffer by only one show() entry. The other words to say,
> > we don't need to call read() of bin_attribute multiple times for a large
> > regular file. Right now, only read-only attribute is supported.
> >
> > Subject: [RFC PATCH] sysfs: support regular device attr which can be larger
> than
> > PAGE_SIZE
> >
> > A regular sysfs ABI could be more than 4KB, right now, we are using
> > bin_attribute to workaround and break this limit. A better solution
> > would be supporting long device attribute. In this case, we will
> > still be able to enjoy the advantages of buffering and seeking of
> > seq file and only need to fill the entire buffer of sysfs entry
> > once.
>
> No, please no. I WANT people to run into this problem and realize that
> it went totally wrong because they should not be having more than one
> "value" in a sysfs file like this.
>
> Let's not make it easy on people please, moving to a bin attribute is a
> big deal, let's keep it that way.
Ok. Got it. Thanks for clarification. That was the experiment I made
recently when I almost got totally stuck.
>
> thanks,
>
> greg k-h
Thanks
Barry