Re: [PATCH 1/4] statx: Add a system call to make enhanced file info available
From: Dave Chinner
Date: Thu Nov 17 2016 - 18:41:52 EST
On Thu, Nov 17, 2016 at 01:35:03PM +0000, David Howells wrote:
> Add a system call to make extended file information available, including
> file creation, data version and some attribute flags where available
> through the underlying filesystem.
>
>
> ========
> OVERVIEW
> ========
>
> The idea was initially proposed as a set of xattrs that could be retrieved
> with getxattr(), but the general preferance proved to be for a new syscall
> with an extended stat structure.
>
> This has a number of uses:
>
....
> (5) Data version number: Could be used by userspace NFS servers [Aneesh
> Kumar].
>
> Can also be used to modify fill_post_wcc() in NFSD which retrieves
> i_version directly, but has just called vfs_getattr(). It could get
> it from the kstat struct if it used vfs_xgetattr() instead.
This needs a much clearer name that stx_version as "version" is
entirely ambiguous. e.g. Inodes have internal version numbers to
disambiguate life cycles. and there are versioning filesystems
which have multiple versions of the same file.
So if stx_version this is intended to export the internal filesystem
inode change counter (i.e. inode->i_version) then lets call it that:
stx_modification_count. It's clear and unambiguous as to what it
represents, especially as this counter is more than just a "data
modification" counter - inode metadata modifications will also
cause it to change....
....
> (13) FS_IOC_GETFLAGS value. These could be translated to BSD's st_flags.
> Note that the Linux IOC flags are a mess and filesystems such as Ext4
> define flags that aren't in linux/fs.h, so translation in the kernel
> may be a necessity (or, possibly, we provide the filesystem type too).
And we now also have FS_IOC_FSGETXATTR that extends the flags
and information userspace can get from filesystems. It makes little
sense to now add xstat() and not add everything this interface
also exports...
> The defined bits in request_mask and stx_mask are:
>
> STATX_TYPE Want/got stx_mode & S_IFMT
> STATX_MODE Want/got stx_mode & ~S_IFMT
> STATX_NLINK Want/got stx_nlink
> STATX_UID Want/got stx_uid
> STATX_GID Want/got stx_gid
> STATX_ATIME Want/got stx_atime{,_ns}
> STATX_MTIME Want/got stx_mtime{,_ns}
> STATX_CTIME Want/got stx_ctime{,_ns}
> STATX_INO Want/got stx_ino
> STATX_SIZE Want/got stx_size
> STATX_BLOCKS Want/got stx_blocks
> STATX_BASIC_STATS [The stuff in the normal stat struct]
> STATX_BTIME Want/got stx_btime{,_ns}
> STATX_VERSION Want/got stx_version
> STATX_ALL [All currently available stuff]
What happens when an application uses STATX_ALL from a future kernel
that defines more flags than are initially supported, and that
application then is run on a kernel that onyl supports the initial
fields?
> stx_btime is the file creation time; stx_version is the data version number
> (i_version); stx_mask is a bitmask indicating the data provided; and
> __spares*[] are where as-yet undefined fields can be placed.
>
> Time fields are split into separate seconds and nanoseconds fields to make
> packing easier and the granularities can be queried with the filesystem
> info system call. Note that times will be negative if before 1970; in such
> a case, the nanosecond fields will also be negative if not zero.
So what happens in ten years time when we want to support
femptosecond resolution in the timestamp interface? We've got to
change everything to 64 bit? Shouldn't we just make everything
timestamp related 64 bit?
>
> The bits defined in the stx_attributes field convey information about a
> file, how it is accessed, where it is and what it does. The following
> attributes map to FS_*_FL flags and are the same numerical value:
Please isolate the new interface flags completely from the FS_*_FL
values. We should not repeat the mistake of tying values derived
from filesystem specific on-disk values to a user interface.
> STATX_ATTR_COMPRESSED File is compressed by the fs
> STATX_ATTR_IMMUTABLE File is marked immutable
> STATX_ATTR_APPEND File is append-only
> STATX_ATTR_NODUMP File is not to be dumped
> STATX_ATTR_ENCRYPTED File requires key to decrypt in fs
>
> The supported flags are listed by:
>
> STATX_ATTR_FS_IOC_FLAGS
Again, we have many more common and extended flags than this.
NOATIME and SYNC are two that immediately come to mind as generic
flags that should be in this...
>
> [Are any other IOC flags of sufficient general interest to be exposed
> through this interface?]
>
> New flags include:
>
> STATX_ATTR_NONUNIX_OWNERSHIP File doesn't have Unixy ownership
> STATX_ATTR_HAS_ACL File has an ACL
So statx will require us to do ACL lookups? i.e. instead of just
reading the inode to get the information, we'll also have to do
extended attribute lookups? That's potentially very expensive if
the extended attribute is not stored in the inode....
> STATX_ATTR_KERNEL_API File is kernel API (eg: procfs/sysfs)
> STATX_ATTR_REMOTE File is remote and needs network
> STATX_ATTR_FABRICATED File was made up by fs
Every file is fabricated by a filesystem :P
Perhaps you're wanting "virtual file" because it is has no physical
presence?
> Fields in struct statx come in a number of classes:
>
> (0) stx_dev_*, stx_blksize.
>
> These are local system information and are always available.
What does stx_blksize actually mean? It's completely ambiguous in
stat() because we don't actually report the physical block size
here - we report the "minimum unit of efficient IO" that we expect
applications to use. Please define :P
> =======
> TESTING
> =======
>
> The following test program can be used to test the statx system call:
>
> samples/statx/test-statx.c
>
> Just compile and run, passing it paths to the files you want to examine.
> The file is built automatically if CONFIG_SAMPLES is enabled.
Can we get xfstests written to exercise and validate all this
functionality, please? I'd suggest that adding xfs_io support for
the statx syscall would be far more useful for xfstests than a
standalone test program, too. We already have equivalent stat()
functionality in xfs_io and that's used quite a bit in xfstests....
Cheers,
Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx