Re: [RFC PATCH] getvalues(2) prototype

From: Casey Schaufler
Date: Wed Mar 23 2022 - 09:51:50 EST


On 3/23/2022 6:24 AM, Miklos Szeredi wrote:
On Wed, 23 Mar 2022 at 12:43, Christian Brauner <brauner@xxxxxxxxxx> wrote:

Yes, we really need a way to query for various fs information. I'm a bit
torn about the details of this interface though. I would really like if
we had interfaces that are really easy to use from userspace comparable
to statx for example.
The reason I stated thinking about this is that Amir wanted a per-sb
iostat interface and dumped it into /proc/PID/mountstats. And that is
definitely not the right way to go about this.

So we could add a statfsx() and start filling in new stuff, and that's
what Linus suggested. But then we might need to add stuff that is not
representable in a flat structure (like for example the stuff that
nfs_show_stats does) and that again needs new infrastructure.

Another example is task info in /proc. Utilities are doing a crazy
number of syscalls to get trivial information. Why don't we have a
procx(2) syscall? I guess because lots of that is difficult to
represent in a flat structure. Just take the lsof example: tt's doing
hundreds of thousands of syscalls on a desktop computer with just a
few hundred processes.

So I'm trying to look beyond fsinfo and about how we could better
retrieve attributes, statistics, small bits and pieces within a
unified framework.

The ease of use argument does not really come into the picture here,
because (unlike stat and friends) most of this info is specialized and
will be either consumed by libraries, specialized utilities
(util-linux, procos) or with a generic utility application that can
query any information about anything that is exported through such an
interface. That applies to plain stat(2) as well: most users will
not switch to statx() simply because that's too generic. And that's
fine, for info as common as struct stat a syscall is warranted. If
the info is more specialized, then I think a truly generic interface
is a much better choice.

I know having this generic as possible was the
goal but I'm just a bit uneasy with such interfaces. They become
cumbersome to use in userspace. I'm not sure if the data: part for
example should be in this at all. That seems a bit out of place to me.
Good point, reduction of scope may help.

Would it be really that bad if we added multiple syscalls for different
types of info? For example, querying mount information could reasonably
be a more focussed separate system call allowing to retrieve detailed
mount propagation info, flags, idmappings and so on. Prior approaches to
solve this in a completely generic way have gotten us not very far too
so I'm a bit worried about this aspect too.
And I fear that this will just result in more and more ad-hoc
interfaces being added, because a new feature didn't quite fit the old
API. You can see the history of this happening all over the place
with multiple new syscall versions being added as the old one turns
out to be not generic enough.

I think a new interface needs to

- be uniform (a single utility can be used to retrieve various
attributes and statistics, contrast this with e.g. stat(1),
getfattr(1), lsattr(1) not to mention various fs specific tools).

- have a hierarchical namespace (the unix path lookup is an example
of this that stood the test of time)

- allow retrieving arbitrary text or binary data

You also need a way to get a list off what attributes are available
and/or a way to get all available attributes. Applications and especially
libraries shouldn't have to guess what information is relevant. If the
attributes change depending on the filesystem and/or LSM involved, and
they do, how can a general purpose library function know what data to
ask for?


And whatever form it takes, I'm sure it will be easier to use than the
mess we currently have in various interfaces like the mount or process
stats.

Thanks,
Miklos