Re: [PATCH 1/4] statx: Add a system call to make enhanced file info available

From: Jeff Layton
Date: Fri Nov 18 2016 - 12:17:44 EST

Next message: JÃrÃme Glisse: "[HMM v13 00/18] HMM (Heterogeneous Memory Management) v13"
Previous message: Walt Feasel: "[PATCH v3] staging: vme: vme_user.c Spelling corrections"
In reply to: David Howells: "Re: [PATCH 1/4] statx: Add a system call to make enhanced file info available"
Next in thread: David Howells: "Re: [PATCH 1/4] statx: Add a system call to make enhanced file info available"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, 2016-11-18 at 09:36 +0000, David Howells wrote:
> Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>
> > > (5) Data version number: Could be used by userspace NFS servers [Aneesh
> > > Kumar].
> > >
> > > Can also be used to modify fill_post_wcc() in NFSD which retrieves
> > > i_version directly, but has just called vfs_getattr(). It could get
> > > it from the kstat struct if it used vfs_xgetattr() instead.
> >
> > This needs a much clearer name that stx_version as "version" is
> > entirely ambiguous. e.g. Inodes have internal version numbers to
> > disambiguate life cycles. and there are versioning filesystems
> > which have multiple versions of the same file.
>
> We've already been through that. I wanted to call it stx_data_version but
> that got argued down to stx_version. The problem is that what the version
> number means is entirely filesystem dependent, and it might not just reflect
> changes in the data.
>

It had better not just reflect data changes.

knfsd populates the NFSv4 change attribute from inode->i_version. It
_must_ have changed between subsequent queries if either the data or
metadata has changed (basically whenever you would update either the
ctime or the mtime).

> > So if stx_version this is intended to export the internal filesystem
> > inode change counter (i.e. inode->i_version) then lets call it that:
> > stx_modification_count. It's clear and unambiguous as to what it
> > represents, especially as this counter is more than just a "data
> > modification" counter - inode metadata modifications will also
> > cause it to change....
>
> I disagree that it's unambiguous. It works like mtime, right?
>

More like ctime + mtime mashed together.

> Which wouldn't be of use for certain filesystems. An example of this would be
> AFS, where it's incremented by 1 each time a write is committed, but is not
> updated for metadata changes. This is what matters for data caching.
>

No. Basically the rules are that if something in the inode data or
metadata changed, then it must be a "larger" value (also accounting for
wraparound). So you also need to change it (usually by incrementing it)
when doing namespace changes that involve it (renames, unlinks, etc.).

> > > (13) FS_IOC_GETFLAGS value. These could be translated to BSD's st_flags.
> > > Note that the Linux IOC flags are a mess and filesystems such as Ext4
> > > define flags that aren't in linux/fs.h, so translation in the kernel
> > > may be a necessity (or, possibly, we provide the filesystem type too).
> >
> > And we now also have FS_IOC_FSGETXATTR that extends the flags
> > and information userspace can get from filesystems. It makes little
> > sense to now add xstat() and not add everything this interface
> > also exports...
>
> I'm not sure I agree. Stuff like extent sizes and extent hint flags seem like
> very specialised things that don't belong in the stat interface. The project
> ID, on the other hand is arguably a good thing to include. But we can always
> add this later.
>

Yes. The entire point of this interface is that it is extendable.

I'm all for simplicity here and just adding the bare minimum of fields
to get the new interface in. The main thing we must do here though is
to ID anything that would hobble us from being able to add new
attributes later.

Adding new fields in later piecemeal patches allows us to demonstrate
that that concept actually works.

> Note that are also two variants of the fsxattr struct defined in the kernel -
> though one is a superset of the other.
>
> > > Time fields are split into separate seconds and nanoseconds fields to make
> > > packing easier and the granularities can be queried with the filesystem
> > > info system call. Note that times will be negative if before 1970; in such
> > > a case, the nanosecond fields will also be negative if not zero.
> >
> > So what happens in ten years time when we want to support
> > femptosecond resolution in the timestamp interface? We've got to
> > change everything to 64 bit? Shouldn't we just make everything
> > timestamp related 64 bit?
>
> You really think we're going to have accurate timestamps with a resolution of
> a millionth of a nanosecond? This means you're going to be doing a 64-bit
> division every time you want a nanosecond timestamp.
>
> Also, violet light has a period of ~1.2fs so your 1fs oscillator might emit UV
> radiation.
>

Could contemporary machines get away with just shifting down by 32
bits?

> > > The bits defined in the stx_attributes field convey information about a
> > > file, how it is accessed, where it is and what it does. The following
> > > attributes map to FS_*_FL flags and are the same numerical value:
> >
> > Please isolate the new interface flags completely from the FS_*_FL
> > values. We should not repeat the mistake of tying values derived
> > from filesystem specific on-disk values to a user interface.
>
> Why shouldn't I make a numerical correspondance with at least one set of such
> flags? I get to define a whole new numberspace and can pick the values I
> want. I see no particular reason to pick explicitly non-corresponding values
> where possible.
>
> Now, I can agree that the code should say, for example:
>
> if (ext4->flag & FS_COMPRESSED_FL)
> statx.attr |= STATX_ATTR_COMPRESSED;
> if (ext4->flag & FS_ENCRYPTED_FL)
> statx.attr |= STATX_ATTR_ENCRYPTED;
> if (ext4->flag & FS_IMMUTABLE_FL)
> statx.attr |= STATX_ATTR_IMMUTABLE;
> ...
>
> and that the *compiler* should collapse this to:
>
> statx.attr |= ext4->flag & mask;
>
> but see:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78317
>
> David
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html

Next message: JÃrÃme Glisse: "[HMM v13 00/18] HMM (Heterogeneous Memory Management) v13"
Previous message: Walt Feasel: "[PATCH v3] staging: vme: vme_user.c Spelling corrections"
In reply to: David Howells: "Re: [PATCH 1/4] statx: Add a system call to make enhanced file info available"
Next in thread: David Howells: "Re: [PATCH 1/4] statx: Add a system call to make enhanced file info available"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]