Re: [PATCH 00/17] VFS: Filesystem information and notifications [ver #17]

From: Miklos Szeredi
Date: Mon Feb 24 2020 - 05:25:16 EST


On Fri, Feb 21, 2020 at 9:21 PM James Bottomley
<James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Fri, 2020-02-21 at 18:01 +0000, David Howells wrote:
> [...]
> > ============================
> > FILESYSTEM INFORMATION QUERY
> > ============================
> >
> > The fsinfo() system call allows information about the filesystem at a
> > particular path point to be queried as a set of attributes, some of
> > which may have more than one value.
> >
> > Attribute values are of four basic types:
> >
> > (1) Version dependent-length structure (size defined by type).
> >
> > (2) Variable-length string (up to 4096, including NUL).
> >
> > (3) List of structures (up to INT_MAX size).
> >
> > (4) Opaque blob (up to INT_MAX size).
> >
> > Attributes can have multiple values either as a sequence of values or
> > a sequence-of-sequences of values and all the values of a particular
> > attribute must be of the same type.
> >
> > Note that the values of an attribute *are* allowed to vary between
> > dentries within a single superblock, depending on the specific dentry
> > that you're looking at, but all the values of an attribute have to be
> > of the same type.
> >
> > I've tried to make the interface as light as possible, so
> > integer/enum attribute selector rather than string and the core does
> > all the allocation and extensibility support work rather than leaving
> > that to the filesystems. That means that for the first two attribute
> > types, the filesystem will always see a sufficiently-sized buffer
> > allocated. Further, this removes the possibility of the filesystem
> > gaining access to the userspace buffer.
> >
> >
> > fsinfo() allows a variety of information to be retrieved about a
> > filesystem and the mount topology:
> >
> > (1) General superblock attributes:
> >
> > - Filesystem identifiers (UUID, volume label, device numbers,
> > ...)
> > - The limits on a filesystem's capabilities
> > - Information on supported statx fields and attributes and IOC
> > flags.
> > - A variety single-bit flags indicating supported capabilities.
> > - Timestamp resolution and range.
> > - The amount of space/free space in a filesystem (as statfs()).
> > - Superblock notification counter.
> >
> > (2) Filesystem-specific superblock attributes:
> >
> > - Superblock-level timestamps.
> > - Cell name.
> > - Server names and addresses.
> > - Filesystem-specific information.
> >
> > (3) VFS information:
> >
> > - Mount topology information.
> > - Mount attributes.
> > - Mount notification counter.
> >
> > (4) Information about what the fsinfo() syscall itself supports,
> > including
> > the type and struct/element size of attributes.
> >
> > The system is extensible:
> >
> > (1) New attributes can be added. There is no requirement that a
> > filesystem implement every attribute. Note that the core VFS
> > keeps a
> > table of types and sizes so it can handle future extensibility
> > rather
> > than delegating this to the filesystems.
> >
> > (2) Version length-dependent structure attributes can be made larger
> > and
> > have additional information tacked on the end, provided it keeps
> > the
> > layout of the existing fields. If an older process asks for a
> > shorter
> > structure, it will only be given the bits it asks for. If a
> > newer
> > process asks for a longer structure on an older kernel, the
> > extra
> > space will be set to 0. In all cases, the size of the data
> > actually
> > available is returned.
> >
> > In essence, the size of a structure is that structure's version:
> > a
> > smaller size is an earlier version and a later version includes
> > everything that the earlier version did.
> >
> > (3) New single-bit capability flags can be added. This is a
> > structure-typed
> > attribute and, as such, (2) applies. Any bits you wanted but
> > the kernel
> > doesn't support are automatically set to 0.
> >
> > fsinfo() may be called like the following, for example:
> >
> > struct fsinfo_params params = {
> > .at_flags = AT_SYMLINK_NOFOLLOW,
> > .flags = FSINFO_FLAGS_QUERY_PATH,
> > .request = FSINFO_ATTR_AFS_SERVER_ADDRESSES,
> > .Nth = 2,
> > };
> > struct fsinfo_server_address address;
> > len = fsinfo(AT_FDCWD, "/afs/grand.central.org/doc", &params,
> > &address, sizeof(address));
> >
> > The above example would query an AFS filesystem to retrieve the
> > address
> > list for the 3rd server, and:
> >
> > struct fsinfo_params params = {
> > .at_flags = AT_SYMLINK_NOFOLLOW,
> > .flags = FSINFO_FLAGS_QUERY_PATH,
> > .request = FSINFO_ATTR_AFS_CELL_NAME;
> > };
> > char cell_name[256];
> > len = fsinfo(AT_FDCWD, "/afs/grand.central.org/doc", &params,
> > &cell_name, sizeof(cell_name));
> >
> > would retrieve the name of an AFS cell as a string.
> >
> > In future, I want to make fsinfo() capable of querying a context
> > created by
> > fsopen() or fspick(), e.g.:
> >
> > fd = fsopen("ext4", 0);
> > struct fsinfo_params params = {
> > .flags = FSINFO_FLAGS_QUERY_FSCONTEXT,
> > .request = FSINFO_ATTR_PARAMETERS;
> > };
> > char buffer[65536];
> > fsinfo(fd, NULL, &params, &buffer, sizeof(buffer));
> >
> > even if that context doesn't currently have a superblock attached. I
> > would prefer this to contain length-prefixed strings so that there's
> > no need to insert escaping, especially as any character, including
> > '\', can be used as the separator in cifs and so that binary
> > parameters can be returned (though that is a lesser issue).
>
> Could I make a suggestion about how this should be done in a way that
> doesn't actually require the fsinfo syscall at all: it could just be
> done with fsconfig. The idea is based on something I've wanted to do
> for configfd but couldn't because otherwise it wouldn't substitute for
> fsconfig, but Christian made me think it was actually essential to the
> ability of the seccomp and other verifier tools in the critique of
> configfd and I belive the same critique applies here.
>
> Instead of making fsconfig functionally configure ... as in you pass
> the attribute name, type and parameters down into the fs specific
> handler and the handler does a string match and then verifies the
> parameters and then acts on them, make it table configured, so what
> each fstype does is register a table of attributes which can be got and
> optionally set (with each attribute having a get and optional set
> function). We'd have multiple tables per fstype, so the generic VFS
> can register a table of attributes it understands for every fstype
> (things like name, uuid and the like) and then each fs type would
> register a table of fs specific attributes following the same pattern.
> The system would examine the fs specific table before the generic one,
> allowing overrides. fsconfig would have the ability to both get and
> set attributes, permitting retrieval as well as setting (which is how I
> get rid of the fsinfo syscall), we'd have a global parameter, which
> would retrieve the entire table by name and type so the whole thing is
> introspectable because the upper layer knows a-priori all the
> attributes which can be set for a given fs type and what type they are
> (so we can make more of the parsing generic). Any attribute which
> doesn't have a set routine would be read only and all attributes would
> have to have a get routine meaning everything is queryable.

And that makes me wonder: would a
"/sys/class/fs/$ST_DEV/options/$OPTION" type interface be feasible for
this?

Thanks,
Miklos