[PATCH 12/25] fsinfo: Add API documentation [ver #14]

From: David Howells
Date: Mon Jun 24 2019 - 10:10:41 EST


Add API documentation for fsinfo.

Signed-off-by: David Howells <dhowells@xxxxxxxxxx>
---

Documentation/filesystems/fsinfo.rst | 596 ++++++++++++++++++++++++++++++++++
1 file changed, 596 insertions(+)
create mode 100644 Documentation/filesystems/fsinfo.rst

diff --git a/Documentation/filesystems/fsinfo.rst b/Documentation/filesystems/fsinfo.rst
new file mode 100644
index 000000000000..497a90f61b3b
--- /dev/null
+++ b/Documentation/filesystems/fsinfo.rst
@@ -0,0 +1,596 @@
+================================
+Filesystem Information Retrieval
+================================
+
+The fsinfo() system call allows the retrieval of filesystem and filesystem
+security information beyond what stat(), statx() and statfs() can query. It
+does not require a file to be opened as does ioctl().
+
+fsinfo() may be called on a path, an open file descriptor, a filesystem-context
+file descriptor as allocated by fsopen() or fspick() or a mount ID (allowing
+for mounts concealed by overmounts to be accessed).
+
+The fsinfo() system call needs to be configured on by enabling:
+
+ "File systems"/"Enable the fsinfo() system call" (CONFIG_FSINFO)
+
+This document has the following sections:
+
+.. contents:: :local:
+
+
+Overview
+========
+
+The fsinfo() system call retrieves one of a number of attributes, specified by
+the "fsinfo_attribute" enumeration::
+
+ FSINFO_ATTR_STATFS - statfs()-style state
+ FSINFO_ATTR_FSINFO - Information about fsinfo() itself
+ FSINFO_ATTR_IDS - Filesystem IDs
+ FSINFO_ATTR_LIMITS - Filesystem limits
+ ...
+
+Each attribute has one of a number of types and, moreover, may have multiple
+values, accessible as a 1D-array or a 2D array-of-arrays. The attribute types
+are:
+
+ * ``Struct``. This is a structure with a version-dependent length. New
+ versions of the kernel may append more fields, though they are not
+ permitted to remove or replace old ones.
+
+ Older applications, expecting an older version of the field, can ask for a
+ shorter struct and will only get the fields they requested; newer
+ applications running on an older kernel will get the extra fields they
+ requested filled with zeros. Either way, the kernel returns the actual size
+ of the internal struct, regardless of how much data it returned.
+
+ This allows for struct-type fields to be extended in future.
+
+ * ``String``. This is a variable-length string of up to 4096 characters (no
+ NUL character is included). The returned string will be truncated if the
+ output buffer is too small. The total size of the string is returned,
+ regardless of any truncation.
+
+ * ``Array``. This is a variable-length array of fixed-size structures. The
+ element size may not vary over time, so the element format must be designed
+ with care. The maximum length is INT_MAX bytes, though this depends on the
+ kernel being able to allocate an internal buffer large enough.
+
+ * ``Opaque``. This is a variable-length blob of indeterminate structure. It
+ may be up to INT_MAX bytes in size.
+
+
+Filesystem API
+==============
+
+The filesystem is called through a superblock_operations method::
+
+ int (*fsinfo) (struct path *path, struct fsinfo_kparams *params);
+
+where "path" indicates the object to be queried and params indicates the
+parameters and the output buffer description. The function should return the
+total size of the data it would like to produce or an error.
+
+The parameter struct looks like::
+
+ struct fsinfo_kparams {
+ enum fsinfo_attribute request;
+ __u32 Nth;
+ __u32 Mth;
+ unsigned int buf_size;
+ unsigned int usage;
+ void *buffer;
+ char *scratch_buffer;
+ ...
+ };
+
+The fields relevant to the filesystem are as follows:
+
+ * ``request``
+
+ Which attribute is being requested. EOPNOTSUPP should be returned if the
+ attribute is not supported by the filesystem or the LSM.
+
+ * ``Nth`` and ``Mth``
+
+ Which value of an attribute is being requested.
+
+ For a single-value attribute Nth and Mth will both be 0.
+
+ For a "1D" attribute, Nth will indicate which value and Mth will always
+ be 0. Take, for example, FSINFO_ATTR_SERVER_NAME - for a network
+ filesystem, the superblock will be backed by a number of servers. This will
+ return the name of the Nth server. ENODATA will be returned if Nth goes
+ beyond the end of the array.
+
+ For a "2D" attribute, Mth will indicate the index in the Nth set of values.
+ Take, for example, Take, for example, FSINFO_ATTR_SERVER_ADDRESS - each
+ server listed by FSINFO_ATTR_SERVER_NAME may have one or more addresses.
+ This will return the Mth address of the Nth server. ENODATA will be
+ returned if the Nth set doesn't exist or the Mth element of the Nth set
+ doesn't exist.
+
+ * ``buf_size``
+
+ This indicates the current size of the buffer. For the array type and the
+ opaque type this will be increased if the current buffer won't hold the
+ value and the filesystem will be called again.
+
+ * ``usage``
+
+ This indicates how much of the buffer has been used so far for an array or
+ opaque type attribute. This is updated by the fsinfo_note_param*()
+ functions.
+
+ * ``buffer``
+
+ This points to the output buffer. For struct-type and string-type
+ attributes it will always be big enough; for array- and opaque-type, it will
+ be buf_size in size and will be resized if the returned size is larger than
+ this.
+
+ * ``scratch_buffer``
+
+ For array- and opaque-type attributes, this will point to a 4096-byte
+ scratch buffer. Sometimes the value needs to be generated by sprintf(),
+ say, to find out how big is going to be, but that might not be possible in
+ the main buffer without risking an overrun.
+
+To simplify filesystem code, there will always be at least a minimal buffer
+available if the ->fsinfo() method gets called - and the filesystem should
+always write what it can into the buffer. It's possible that the fsinfo()
+system call will then throw the contents away and just return the length.
+
+
+Helper Functions
+================
+
+The API includes a number of helper functions:
+
+ * ``int generic_fsinfo(struct path *path, struct fsinfo_kparams *params);``
+
+ This is the function that does default actions for filling out attribute
+ values from standard data, such as may be found in the file_system_type
+ struct and the super_block struct. It also generates -EOPNOTSUPP for
+ unsupported attributes.
+
+ This should be called by a filesystem if it doesn't want to handle an
+ attribute. The filesystem may also call this function and then adjust the
+ information returned, such as changing the listed capability flags.
+
+ * ``void fsinfo_set_cap(struct fsinfo_capabilities *c,
+ enum fsinfo_capability cap);``
+
+ This function sets a capability flag.
+
+ * ``void fsinfo_clear_cap(struct fsinfo_capabilities *c,
+ enum fsinfo_capability cap);``
+
+ This function clears a capability flag.
+
+ * ``void fsinfo_set_unix_caps(struct fsinfo_capabilities *caps);``
+
+ Set capability flags appropriate to the features of a standard UNIX
+ filesystem, such as having numeric UIDS and GIDS; allowing the creation of
+ directories, symbolic links, hard links, device files, FIFO and socket
+ files; permitting sparse files; and having access, change and modification
+ times.
+
+ * ``void fsinfo_note_sb_params(struct fsinfo_kparams *params,
+ unsigned int s_flags);``
+
+ This function notes the standard parameters corresponding to certain
+ ``SB_*`` flags in ``sb->s_flags`` into the parameter buffer. The filesystem
+ is at liberty to adjust the s_flags mask as it sees fit.
+
+ This is intended for use with FSINFO_ATTR_PARAMETERS.
+
+ * ``void fsinfo_note_param(struct fsinfo_kparams *params, const char *key,
+ const char *val);``
+
+ This function writes a pair of strings with prepended lengths into
+ params->buffer, if there's space, and always updates params->usage. The
+ assumption is that the caller of s->s_op->fsinfo() will resize the buffer if
+ the usage grew too large and call again.
+
+ This is intended for use with FSINFO_ATTR_{,LSM_}PARAMETERS, but is not
+ limited to those. The format allows binary data, though this API function
+ does not support anything with NUL characters in it.
+
+ Note that this function will not sleep, so is safe to take with locks held.
+
+ * ``void fsinfo_note_paramf(struct fsinfo_kparams *params, const char *key,
+ const char *val_fmt, ...);``
+
+ This function is a simple wrapper around fsinfo_note_param(), writing the
+ value using vsnprintf() into params->scratch_buffer and then jumping to
+ fsinfo_note_param().
+
+
+Attribute Summary
+=================
+
+To summarise the attributes that are defined::
+
+ Symbolic name Type
+ ===================================== ===============
+ FSINFO_ATTR_STATFS struct
+ FSINFO_ATTR_FSINFO struct
+ FSINFO_ATTR_IDS struct
+ FSINFO_ATTR_LIMITS struct
+ FSINFO_ATTR_SUPPORTS struct
+ FSINFO_ATTR_CAPABILITIES struct
+ FSINFO_ATTR_TIMESTAMP_INFO struct
+ FSINFO_ATTR_VOLUME_ID string
+ FSINFO_ATTR_VOLUME_UUID struct
+ FSINFO_ATTR_VOLUME_NAME string
+ FSINFO_ATTR_NAME_ENCODING string
+ FSINFO_ATTR_NAME_CODEPAGE string
+ FSINFO_ATTR_PARAM_DESCRIPTION struct
+ FSINFO_ATTR_PARAM_SPECIFICATION N Ã struct
+ FSINFO_ATTR_PARAM_ENUM N Ã struct
+ FSINFO_ATTR_PARAMETERS opaque
+ FSINFO_ATTR_LSM_PARAMETERS opaque
+ FSINFO_ATTR_MOUNT_INFO struct
+ FSINFO_ATTR_MOUNT_DEVNAME string
+ FSINFO_ATTR_MOUNT_CHILDREN array
+ FSINFO_ATTR_MOUNT_SUBMOUNT N Ã string
+ FSINFO_ATTR_SERVER_NAME N Ã string
+ FSINFO_ATTR_SERVER_ADDRESS N Ã M Ã struct
+ FSINFO_ATTR_CELL_NAME string
+
+
+Attribute Catalogue
+===================
+
+A number of the attributes convey information about a filesystem superblock:
+
+ * ``FSINFO_ATTR_STATFS``
+
+ This struct-type attribute gives most of the equivalent data to statfs(),
+ but with all the fields as unconditional 64-bit or 128-bit integers. Note
+ that static data like IDs that don't change are retrieved with
+ FSINFO_ATTR_IDS instead.
+
+ Further, superblock flags (such as MS_RDONLY) are not exposed by this
+ attribute; rather the parameters must be listed and the attributes picked
+ out from that.
+
+ * ``FSINFO_ATTR_IDS``
+
+ This struct-type attribute conveys various identifiers used by the target
+ filesystem. This includes the filesystem name, the NFS filesystem ID, the
+ superblock ID used in notifications, the filesystem magic type number and
+ the primary device ID.
+
+ * ``FSINFO_ATTR_LIMITS``
+
+ This struct-type attribute conveys the limits on various aspects of a
+ filesystem, such as maximum file, symlink and xattr sizes, maxiumm filename
+ and xattr name length, maximum number of symlinks, maximum device major and
+ minor numbers and maximum UID, GID and project ID numbers.
+
+ * ``FSINFO_ATTR_SUPPORTS``
+
+ This struct-type attribute conveys information about the support the
+ filesystem has for various UAPI features of a filesystem. This includes
+ information about which bits are supported in various masks employed by the
+ statx system call, what FS_IOC_* flags are supported by ioctls and what
+ DOS/Windows file attribute flags are supported.
+
+ * ``FSINFO_ATTR_CAPABILITIES``
+
+ This is a special attribute, being a set of single-bit capability flags,
+ formatted as struct-type attribute. The meanings of the capability bits
+ are listed below - see the "Capability Bit Catalogue" section. The
+ capability bits are grouped numerically into bytes, such that capilities
+ 0-7 are in byte 0, 8-15 are in byte 1, 16-23 in byte 2 and so on.
+
+ Any capability bit that's not supported by the kernel will be set to false
+ if asked for. The highest supported capability can be obtained from
+ attribute "FSINFO_ATTR_FSINFO".
+
+ * ``FSINFO_ATTR_TIMESTAMP_INFO``
+
+ This struct-type attribute conveys information about the resolution and
+ range of the timestamps available in a filesystem. The resolutions are
+ given as a mantissa and exponent (resolution = mantissa * 10^exponent
+ seconds), where the exponent can be negative to indicate a sub-second
+ resolution (-9 being nanoseconds, for example).
+
+ * ``FSINFO_ATTR_VOLUME_ID``
+
+ This is a string-type attribute that conveys the superblock identifier for
+ the volume. By default it will be filled in from the contents of s_id from
+ the superblock. For a block-based filesystem, for example, this might be
+ the name of the primary block device.
+
+ * ``FSINFO_ATTR_VOLUME_UUID``
+
+ This is a struct-type attribute that conveys the UUID identifier for the
+ volume. By default it will be filled in from the contents of s_uuid from
+ the superblock. If this doesn't exist, it will be an entirely zeros.
+
+ * ``FSINFO_ATTR_VOLUME_NAME``
+
+ This is a string-type attribute that conveys the name of the volume. By
+ default it will return EOPNOTSUPP. For a disk-based filesystem, it might
+ convey the partition label; for a network-based filesystem, it might convey
+ the name of the remote volume.
+
+ * ``FSINFO_ATTR_NAME_ENCODING``
+
+ This is a string-type attribute that returns the type of encoding used for
+ filenames in the medium. By default this will be filled in with "utf8".
+ Not all filesystems can support that, however, so this may indicate a
+ restriction on what characters can be used.
+
+ * ``FSINFO_ATTR_NAME_CODEPAGE``
+
+ This is a string-type attribute that returns the name of the codepage used
+ to transliterate a Linux utf8 filename into whatever the medium supports.
+ By default it returns EOPNOTSUPP.
+
+
+The next attributes give information about the mount parameter parsers and the
+mount parameters values stored in a superblock and its security data. The
+first few of these can be queried on the file descriptor returned by fsopen()
+before any superblock is attached:
+
+ * ``FSINFO_ATTR_PARAM_DESCRIPTION``
+
+ This is a struct-type attribute that returns summary information about what
+ mount options are available on a filesystem, including the number of
+ parameters and the number of enum symbols.
+
+ * ``FSINFO_ATTR_PARAM_SPECIFICATION``
+
+ This is a 1D array of struct-type attributes, indicating the type,
+ qualifiers, name and an option ID for the Nth mount parameter. Parameters
+ that have the same option ID are presumed to be synonyms.
+
+ * ``FSINFO_ATTR_PARAM_ENUM``
+
+ This is a 1D array of struct-type attributes, indicating the Nth value
+ symbol for the set of enumeration-type parameters. All the values are in
+ the same table, so they can be matched to the parameter by option ID, and
+ each option ID may have several entries, each with a different name.
+
+ * ``FSINFO_ATTR_PARAMETERS``
+ * ``FSINFO_ATTR_LSM_PARAMETERS``
+
+ These are a pair of opaque blobs that list all the mount parameter values
+ currently set on a superblock. The first set come from the filesystem and
+ the second is from the LSMs - and, as such, convey security information,
+ such as labelling.
+
+ Inside the filesystem or LSM, the parameter values should be read in one go
+ under lock to avoid races with remount if necessary.
+
+ Each opaque blob is encoded as a series of pairs of elements, where each
+ element begins with a length. The first element of each pair is the key
+ name and the second is the value (which may contain commas, binary data,
+ NUL chars).
+
+ An element length is encoded as a series of bytes in most->least signifcant
+ order. Each byte contributes 7 bits to the length. The MSB in each byte
+ is set if there's another byte of length information following on (ie. all
+ but the last byte in the length have the MSB set).
+
+ A number of helper functions are provided to help record the parameters::
+
+ fsinfo_note_sb_params()
+ fsinfo_note_param()
+ fsinfo_note_paramf()
+
+ Note that the first is not applicable to LSM parameters. It is called
+ automatically if the filesystem doesn't implement the attribute, but must,
+ and should, be called manually otherwise. It should also be called first,
+ before noting any other parameters.
+
+
+Then there are attributes that convey information about the mount topology:
+
+ * ``FSINFO_ATTR_MOUNT_INFO``
+
+ This struct-type attribute conveys information about a mount topology node
+ rather than a superblock. This includes the ID of the superblock mounted
+ there and the ID of the mount node, its parent, group, master and
+ propagation source. It also contains the attribute flags for the mount and
+ a change notification counter so that it can be quickly determined if that
+ node changed.
+
+ * ``FSINFO_ATTR_MOUNT_DEVNAME``
+
+ This string-type attribute returns the "device name" that was supplied when
+ the mount object was created.
+
+ * ``FSINFO_ATTR_MOUNT_CHILDREN``
+
+ This is an array-type attribute that conveys a set of structs, each of
+ which indicates the mount ID of a child and the change counter for that
+ child. The kernel also tags an extra element on the end that indicates the
+ ID and change counter of the queried object. This allows a conflicting
+ change to be quickly detected by comparing the before and after counters.
+
+ * ``FSINFO_ATTR_MOUNT_SUBMOUNT``
+
+ This is a string-type attribute that conveys the pathname of the Nth
+ mountpoint under the target mount, relative to the mount root or the
+ chroot, whichever is closer. These correspond on a 1:1 basis with the
+ elements in the FSINFO_ATTR_MOUNT_CHILDREN list.
+
+Then there are filesystem-specific attributes.
+
+ * ``FSINFO_ATTR_SERVER_NAME``
+
+ This is a string-type attribute that conveys the name of the Nth server
+ backing a network-filesystem superblock.
+
+ * ``FSINFO_ATTR_SERVER_ADDRESS``
+
+ This is a struct-type attribute that conveys the Mth address of the Nth
+ server, as returned by FSINFO_ATTR_SERVER_NAME.
+
+ * ``FSINFO_ATTR_CELL_NAME``
+
+ This is a string-type attribute that retrieves the AFS cell name of the
+ target object.
+
+
+Lastly, one attribute gives information about fsinfo() itself:
+
+ * ``FSINFO_ATTR_FSINFO``
+
+ This struct-type attribute gives information about the fsinfo() system call
+ itself, including the maximum number of attributes supported and the
+ maximum number of capability bits supported.
+
+
+Capability Bit Catalogue
+========================
+
+The capability bits convey single true/false assertions about a specific
+instance of a filesystem (ie. a specific superblock). They are accessed using
+the "FSINFO_ATTR_CAPABILITY" attribute:
+
+ * ``FSINFO_CAP_IS_KERNEL_FS``
+ * ``FSINFO_CAP_IS_BLOCK_FS``
+ * ``FSINFO_CAP_IS_FLASH_FS``
+ * ``FSINFO_CAP_IS_NETWORK_FS``
+ * ``FSINFO_CAP_IS_AUTOMOUNTER_FS``
+ * ``FSINFO_CAP_IS_MEMORY_FS``
+
+ These indicate what kind of filesystem the target is: kernel API (proc),
+ block-based (ext4), flash/nvm-based (jffs2), remote over the network (NFS),
+ local quasi-filesystem that acts as a tray of mountpoints (autofs), plain
+ in-memory filesystem (shmem).
+
+ * ``FSINFO_CAP_AUTOMOUNTS``
+
+ This indicate if a filesystem may have objects that are automount points.
+
+ * ``FSINFO_CAP_ADV_LOCKS``
+ * ``FSINFO_CAP_MAND_LOCKS``
+ * ``FSINFO_CAP_LEASES``
+
+ These indicate if a filesystem supports advisory locks, mandatory locks or
+ leases.
+
+ * ``FSINFO_CAP_UIDS``
+ * ``FSINFO_CAP_GIDS``
+ * ``FSINFO_CAP_PROJIDS``
+
+ These indicate if a filesystem supports/stores/transports numeric user IDs,
+ group IDs or project IDs. The "FSINFO_ATTR_LIMITS" attribute can be used
+ to find out the upper limits on the IDs values.
+
+ * ``FSINFO_CAP_STRING_USER_IDS``
+
+ This indicates if a filesystem supports/stores/transports string user
+ identifiers.
+
+ * ``FSINFO_CAP_GUID_USER_IDS``
+
+ This indicates if a filesystem supports/stores/transports Windows GUIDs as
+ user identifiers (eg. ntfs).
+
+ * ``FSINFO_CAP_WINDOWS_ATTRS``
+
+ This indicates if a filesystem supports Windows FILE_* attribute bits
+ (eg. cifs, jfs). The "FSINFO_ATTR_SUPPORTS" attribute can be used to find
+ out which windows file attributes are supported by the filesystem.
+
+ * ``FSINFO_CAP_USER_QUOTAS``
+ * ``FSINFO_CAP_GROUP_QUOTAS``
+ * ``FSINFO_CAP_PROJECT_QUOTAS``
+
+ These indicate if a filesystem supports quotas for users, groups or
+ projects.
+
+ * ``FSINFO_CAP_XATTRS``
+
+ These indicate if a filesystem supports extended attributes. The
+ "FSINFO_ATTR_LIMITS" attribute can be used to find out the upper limits on
+ the supported name and body lengths.
+
+ * ``FSINFO_CAP_JOURNAL``
+ * ``FSINFO_CAP_DATA_IS_JOURNALLED``
+
+ These indicate whether the filesystem has a journal and whether data
+ changes are logged to it.
+
+ * ``FSINFO_CAP_O_SYNC``
+ * ``FSINFO_CAP_O_DIRECT``
+
+ These indicate whether the filesystem supports the O_SYNC and O_DIRECT
+ flags.
+
+ * ``FSINFO_CAP_VOLUME_ID``
+ * ``FSINFO_CAP_VOLUME_UUID``
+ * ``FSINFO_CAP_VOLUME_NAME``
+ * ``FSINFO_CAP_VOLUME_FSID``
+
+ These indicate whether ID, UUID, name and FSID identifiers actually exist
+ in the filesystem and thus might be considered persistent.
+
+ * ``FSINFO_CAP_IVER_ALL_CHANGE``
+ * ``FSINFO_CAP_IVER_DATA_CHANGE``
+ * ``FSINFO_CAP_IVER_MONO_INCR``
+
+ These indicate whether i_version in the inode is supported and, if so, what
+ mode it operates in. The first two indicate if it's changed for any data
+ or metadata change, or whether it's only changed for any data changes; the
+ last indicates whether or not it's monotonically increasing for each such
+ change.
+
+ * ``FSINFO_CAP_HARD_LINKS``
+ * ``FSINFO_CAP_HARD_LINKS_1DIR``
+
+ These indicate whether the filesystem can have hard links made in it, and
+ whether they can be made between directory or only within the same
+ directory.
+
+ * ``FSINFO_CAP_DIRECTORIES``
+ * ``FSINFO_CAP_SYMLINKS``
+ * ``FSINFO_CAP_DEVICE_FILES``
+ * ``FSINFO_CAP_UNIX_SPECIALS``
+
+ These indicate whether directories; symbolic links; device files; or pipes
+ and sockets can be made within the filesystem.
+
+ * ``FSINFO_CAP_RESOURCE_FORKS``
+
+ This indicates if the filesystem supports resource forks.
+
+ * ``FSINFO_CAP_NAME_CASE_INDEP``
+ * ``FSINFO_CAP_NAME_NON_UTF8``
+ * ``FSINFO_CAP_NAME_HAS_CODEPAGE``
+
+ These indicate if the filesystem supports case-independent file names,
+ whether the filenames are non-utf8 (see the "FSINFO_ATTR_NAME_ENCODING"
+ attribute) and whether a codepage is in use to transliterate them (see
+ the "FSINFO_ATTR_NAME_CODEPAGE" attribute).
+
+ * ``FSINFO_CAP_SPARSE``
+
+ This indicates if a filesystem supports sparse files.
+
+ * ``FSINFO_CAP_NOT_PERSISTENT``
+
+ This indicates if a filesystem is not persistent.
+
+ * ``FSINFO_CAP_NO_UNIX_MODE``
+
+ This indicates if a filesystem doesn't support UNIX mode bits (though they
+ may be manufactured from other bits, such as Windows file attribute flags).
+
+ * ``FSINFO_CAP_HAS_ATIME``
+ * ``FSINFO_CAP_HAS_BTIME``
+ * ``FSINFO_CAP_HAS_CTIME``
+ * ``FSINFO_CAP_HAS_MTIME``
+
+ These indicate which timestamps a filesystem supports (access, birth,
+ change, modify). The range and resolutions can be queried with the
+ "FSINFO_ATTR_TIMESTAMPS" attribute).