On Tue, Mar 22, 2022 at 08:27:12PM +0100, Miklos Szeredi wrote:
Add a new userspace API that allows getting multiple short values in aHow does doing the open/read/close in a single syscall make this any
single syscall.
This would be useful for the following reasons:
- Calling open/read/close for many small files is inefficient. E.g. on my
desktop invoking lsof(1) results in ~60k open + read + close calls under
/proc and 90% of those are 128 bytes or less.
more efficient? All it saves is the overhead of a couple of
syscalls, it doesn't reduce any of the setup or teardown overhead
needed to read the data itself....
- Interfaces for getting various attributes and statistics are fragmented.https://xkcd.com/927/
For files we have basic stat, statx, extended attributes, file attributes
(for which there are two overlapping ioctl interfaces). For mounts and
superblocks we have stat*fs as well as /proc/$PID/{mountinfo,mountstats}.
The latter also has the problem on not allowing queries on a specific
mount.
- Some attributes are cheap to generate, some are expensive. AllowingAhhh, XFS_IOC_ATTRMULTI_BY_HANDLE reborn. This is how xfsdump gets
userspace to select which ones it needs should allow optimizing queries.
- Adding an ascii namespace should allow easy extension and self
description.
- The values can be text or binary, whichever is fits best.
The interface definition is:
struct name_val {
const char *name; /* in */
struct iovec value_in; /* in */
struct iovec value_out; /* out */
uint32_t error; /* out */
uint32_t reserved;
};
and sets attributes efficiently when dumping and restoring files -
it's an interface that allows batches of xattr operations to be run
on a file in a single syscall.
I've said in the past when discussing things like statx() that maybe
everything should be addressable via the xattr namespace and
set/queried via xattr names regardless of how the filesystem stores
the data. The VFS/filesystem simply translates the name to the
storage location of the information. It might be held in xattrs, but
it could just be a flag bit in an inode field.
Then we just get named xattrs in batches from an open fd.
int getvalues(int dfd, const char *path, struct name_val *vec, size_t num,How are these different from just declaring new xattr namespaces for
unsigned int flags);
@dfd and @path are used to lookup object $ORIGIN. @vec contains @num
name/value descriptors. @flags contains lookup flags for @path.
The syscall returns the number of values filled or an error.
A single name/value descriptor has the following fields:
@name describes the object whose value is to be returned. E.g.
mnt - list of mount parameters
mnt:mountpoint - the mountpoint of the mount of $ORIGIN
mntns - list of mount ID's reachable from the current root
mntns:21:parentid - parent ID of the mount with ID of 21
xattr:security.selinux - the security.selinux extended attribute
data:foo/bar - the data contained in file $ORIGIN/foo/bar
these things. e.g. open any file and list the xattrs in the
xattr:mount.mnt namespace to get the list of mount parameters for
that mount.
Why do we need a new "xattr in everything but name" interface when
we could just extend the one we've already got and formalise a new,
cleaner version of xattr batch APIs that have been around for 20-odd
years already?
Cheers,
Dave.