Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)

From: Jeffrey E Altman
Date: Wed Aug 12 2020 - 23:56:56 EST


On 8/12/2020 2:18 PM, Linus Torvalds (torvalds@xxxxxxxxxxxxxxxxxxxx) wrote:
> What's wrong with fstatfs()? All the extra magic metadata seems to not
> really be anything people really care about.
>
> What people are actually asking for seems to be some unique mount ID,
> and we have 16 bytes of spare information in 'struct statfs64'.
>
> All the other fancy fsinfo stuff seems to be "just because", and like
> complete overdesign.

Hi Linus,

Is there any existing method by which userland applications can
determine the properties of the filesystem in which a directory or file
is stored in a filesystem agnostic manner?

Over the past year I've observed the opendev/openstack community
struggle with performance issues caused by rsync's inability to
determine if the source and destination object's last update time have
the same resolution and valid time range. If the source file system
supports 100 nanosecond granularity and the destination file system
supports one second granularity, any source file with a non-zero
fractional seconds timestamp will appear to have changed compared to the
copy in the destination filesystem which discarded the fractional
seconds during the last sync. Sure, the end user could use the
--modify-window=1 option to inform rsync to add fuzz to the comparisons,
but that introduces the possibility that a file updated a fraction of a
second after an rsync execution would not synchronize the file on the
next run when both source and target have fine grained timestamps. If
the userland sync processes have access to the source and destination
filesystem time capabilities, they can make more intelligent decisions
without explicit user input. At a minimum, the timestamp properties
that are important to know include the range of valid timestamps and the
resolution. Some filesystems support unsigned 32-bit time starting with
UNIX epoch. Others signed 32-bit time with UNIX epoch. Still others
FAT, NTFS, etc use alternative epochs and range and resolutions.

Another case where lack of filesystem properties is problematic is "df
--local" which currently relies upon string comparisons of file system
name strings to determine if the underlying file system is local or
remote. This requires that the gnulib maintainers have knowledge of all
file systems implementations, their published names, and which category
they belong to. Patches have been accepted in the past year to add
"smb3", "afs", and "gpfs" to the list of remote file systems. There are
many more remote filesystems that have yet to be added including
"cephfs", "lustre", "gluster", etc.

In many cases, the filesystem properties cannot be inferred from the
filesystem name. For network file systems, these properties might
depend upon the remote server capabilities or even the properties
associated with a particular volume or share. Consider the case of a
remote file server that supports 64-bit 100ns time but which for
backward compatibility exports certain volumes or shares with more
restrictive capabilities. Or the case of a network file system protocol
that has evolved over time and gained new capabilities.

For the AFS community, fsinfo offers a method of exposing some server
and volume properties that are obtained via "path ioctls" in OpenAFS and
AuriStorFS. Some example of properties that might be exposed include
answers to questions such as:

* what is the volume cell id? perhaps a uuid.
* what is the volume id in the cell? unsigned 64-bit integer
* where is a mounted volume hosted? which fileservers, named by uuid
* what is the block size? 1K, 4K, ...
* how many blocks are in use or available?
* what is the quota (thin provisioning), if any?
* what is the reserved space (fat provisioning), if any?
* how many vnodes are present?
* what is the vnode count limit, if any?
* when was the volume created and last updated?
* what is the file size limit?
* are byte range locks supported?
* are mandatory locks supported?
* how many entries can be created within a directory?
* are cross-directory hard links supported?
* are directories just-send-8, case-sensitive, case-preserving, or
case-insensitive?
* if not just-send-8, what character set is used?
* if Unicode, what normalization rules? etc.
* are per-object acls supported?
* what volume maximum acl is assigned, if any?
* what volume security policy (authn, integ, priv) is assigned, if any?
* what is the replication policy, if any?
* what is the volume encryption policy, if any?
* what is the volume compression policy, if any?
* are server-to-server copies supported?
* which of atime, ctime and mtime does the volume support?
* what is the permitted timestamp range and resolution?
* are xattrs supported?
* what is the xattr maximum name length?
* what is the xattr maximum object size?
* is the volume currently reachable?
* is the volume immutable?
* etc ...

Its true that there isn't widespread use of these filesystem properties
by today's userland applications but that might be due to the lack of
standard interfaces necessary to acquire the information. For example,
userland frameworks for parallel i/o HPC applications such as HDF5,
PnetCDF and ROMIO require each supported filesystem to provide its own
proprietary "driver" which does little more than expose the filesystem
properties necessary to optimize the layout of file stream data
structures. With something like "fsinfo" it would be much easier to
develop these HPC frameworks in a filesystem agnostic manner. This
would permit applications built upon these frameworks to use the best
Linux filesystem available for the workload and not simply the ones for
which proprietary "drivers" have been published.

Although I am sympathetic to the voices in the community that would
prefer to start over with a different architectural approach, David's
fsinfo has been under development for more than two years. It has not
been developed in a vacuum but in parallel with other kernel components
that have been merged during that time frame. From my reading of this
thread and those that preceded it, fsinfo has also been developed with
input from significant userland development communities that intend to
leverage the syscall interface as soon as it becomes available. The
March 2020 discussion of fsinfo received positive feedback not only from
within Red Hat but from other parties as well.

Since no one stepped up to provide an alternative approach in the last
five months, how long should those that desire access to the
functionality be expected to wait for it?

What is the likelihood that an alternative robust solution will be
available in the next merge window or two?

Is the design so horrid that it is better to go without the
functionality than to live with the imperfections?

I for one would like to see this functionality be made available sooner
rather than later. I know my end users would benefit from the
availability of fsinfo.

Thank you for listening. Stay healthy and safe, and please wear a mask.

Jeffrey Altman

begin:vcard
fn:Jeffrey Altman
n:Altman;Jeffrey
org:AuriStor, Inc.
adr:;;255 W 94TH ST STE 6B;New York;NY;10025-6985;United States
email;internet:jaltman@xxxxxxxxxxxx
title:CEO
tel;work:+1-212-769-9018
url:https://www.linkedin.com/in/jeffreyaltman/
version:2.1
end:vcard

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature