Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster
From: Greg KH
Date:  Sun Jul 05 2020 - 07:44:57 EST
On Sun, Jul 05, 2020 at 01:07:14AM -0700, Vito Caputo wrote:
> On Sun, Jul 05, 2020 at 04:27:32AM +0100, Matthew Wilcox wrote:
> > On Sun, Jul 05, 2020 at 05:18:58AM +0200, Jan Ziak wrote:
> > > On Sun, Jul 5, 2020 at 5:12 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> > > >
> > > > You should probably take a look at io_uring.  That has the level of
> > > > complexity of this proposal and supports open/read/close along with many
> > > > other opcodes.
> > > 
> > > Then glibc can implement readfile using io_uring and there is no need
> > > for a new single-file readfile syscall.
> > 
> > It could, sure.  But there's also a value in having a simple interface
> > to accomplish a simple task.  Your proposed API added a very complex
> > interface to satisfy needs that clearly aren't part of the problem space
> > that Greg is looking to address.
> 
> I disagree re: "aren't part of the problem space".
> 
> Reading small files from procfs was specifically called out in the
> rationale for the syscall.
> 
> In my experience you're rarely monitoring a single proc file in any
> situation where you care about the syscall overhead.  You're
> monitoring many of them, and any serious effort to do this efficiently
> in a repeatedly sampled situation has cached the open fds and already
> uses pread() to simply restart from 0 on every sample and not
> repeatedly pay for the name lookup.
That's your use case, but many other use cases are just "read a bunch of
sysfs files in one shot".  Examples of that are tools that monitor
uevents and lots of hardware-information gathering tools.
Also not all tools sem to be as smart as you think they are, look at
util-linux for loads of the "open/read/close" lots of files pattern.  I
had a half-baked patch to convert it to use readfile which I need to
polish off and post with the next series to show how this can be used to
both make userspace simpler as well as use less cpu time.
> Basically anything optimally using the existing interfaces for
> sampling proc files needs a way to read multiple open file descriptors
> in a single syscall to move the needle.
Is psutils using this type of interface, or do they constantly open
different files?
What about fun tools like bashtop:
	https://github.com/aristocratos/bashtop.git
which thankfully now relies on python's psutil package to parse proc in
semi-sane ways, but that package does loads of constant open/read/close
of proc files all the time from what I can tell.
And lots of people rely on python's psutil, right?
> This syscall doesn't provide that.  It doesn't really give any
> advantage over what we can achieve already.  It seems basically
> pointless to me, from a monitoring proc files perspective.
What "good" monitoring programs do you suggest follow the pattern you
recommend?
thanks,
greg k-h