Re: [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()

From: Darrick J. Wong
Date: Thu Apr 09 2020 - 12:59:48 EST


On Thu, Apr 09, 2020 at 08:29:44AM -0700, Ira Weiny wrote:
> On Wed, Apr 08, 2020 at 05:30:21PM -0700, Darrick J. Wong wrote:
>
> [snip]
>
> >
> > But you're right, this thing keeps swirling around and around and around
> > because we can't ever get to agreement on this. Maybe I'll just become
> > XFS BOFH MAINTAINER and make a decision like this:
> >
> > 1 Applications must call statx to discover the current S_DAX state.
> >
> > 2 There exists an advisory file inode flag FS_XFLAG_DAX that is set based on
> > the parent directory FS_XFLAG_DAX inode flag. This advisory flag can be
> > changed after file creation, but it does not immediately affect the S_DAX
> > state.
> >
> > If FS_XFLAG_DAX is set and the fs is on pmem then it will enable S_DAX at
> > inode load time; if FS_XFLAG_DAX is not set, it will not enable S_DAX.
> > Unless overridden...
> >
> > 3 There exists a dax= mount option.
> >
> > "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX"
> > "-o dax=always" means "always set S_DAX (at least on pmem), ignore FS_XFLAG_DAX"
> > "-o dax" by itself means "dax=always"
> > "-o dax=iflag" means "follow FS_XFLAG_DAX" and is the default
>
> per-Dave '-o dax=inode'

Ok.

> >
> > 4 There exists an advisory directory inode flag FS_XFLAG_DAX that can be
> > changed at any time. The flag state is copied into any files or
> > subdirectories when they are created within that directory.
>
> Good.
>
> > If programs
> > require file access runs in S_DAX mode, they must create those files
> > inside a directory with FS_XFLAG_DAX set, or mount the fs with an
> > appropriate dax mount option.
>
> Why do we need this to be true? If the FS_XFLAG_DAX flag can be cleared why
> not set it and allow the S_DAX change to occur later just like clearing it?
> The logic is exactly the same.

I think I'll just delete this sentence since I started pushing all that
verbiage towards (5).

To answer your question, yes, FS_XFLAG_DAX can be set on a file at any
time, the same as it can be cleared at any time. Sorry that was
unclear, I'll fix that for the next draft (below).

> >
> > 5 Programs that require a specific file access mode (DAX or not DAX) must
>
> s/must/can/

Ok.

> > do one of the following:
> >
> > (a) create files in directories with the FS_XFLAG_DAX flag set as needed;
>
> Again if we allow clearing the flag why not setting? So this is 1 option they
> 'can' do.
>
> >
> > (b) have the administrator set an override via mount option;
> >
> > (c) if they need to change a file's FS_XFLAG_DAX flag so that it does not
> > match the S_DAX state (as reported by statx), they must cause the
> > kernel to evict the inode from memory. This can be done by:
> >
> > i> closing the file;
> > ii> re-opening the file and using statx to see if the fs has
> > changed the S_DAX flag;
>
> i and ii need to be 1 step the user must follow.

Ok, I'll combine these.

> > iii> if not, either unmount and remount the filesystem, or
> > closing the file and using drop_caches.
> >
> > 6 I no longer think it's too wild to require that users who want to
> > squeeze every last bit of performance out of the particular rough and
> > tumble bits of their storage also be exposed to the difficulties of
> > what happens when the operating system can't totally virtualize those
> > hardware capabilities. Your high performance sports car is not a
> > Toyota minivan, as it were.
>
> I'm good with this statement. But I think we need to clean up the verbiage for
> the documentation... ;-)

Heh. :)

> Thanks for the summary. I like these to get everyone on the same page. :-D

And today:

1. There exists an in-kernel access mode flag S_DAX that is set when
file accesses go directly to persistent memory, bypassing the page
cache. Applications must call statx to discover the current S_DAX
state (STATX_ATTR_DAX).

2. There exists an advisory file inode flag FS_XFLAG_DAX that is
inherited from the parent directory FS_XFLAG_DAX inode flag at file
creation time. This advisory flag can be set or cleared at any
time, but doing so does not immediately affect the S_DAX state.

Unless overridden by mount options (see (3)), if FS_XFLAG_DAX is set
and the fs is on pmem then it will enable S_DAX at inode load time;
if FS_XFLAG_DAX is not set, it will not enable S_DAX.

3. There exists a dax= mount option.

"-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX."

"-o dax=always" means "always set S_DAX (at least on pmem),
and ignore FS_XFLAG_DAX."

"-o dax" is an alias for "dax=always".

"-o dax=inode" means "follow FS_XFLAG_DAX" and is the default.

4. There exists an advisory directory inode flag FS_XFLAG_DAX that can
be set or cleared at any time. The flag state is copied into any
files or subdirectories when they are created within that directory.

5. Programs that require a specific file access mode (DAX or not DAX)
can do one of the following:

(a) Create files in directories that the FS_XFLAG_DAX flag set as
needed; or

(b) Have the administrator set an override via mount option; or

(c) Set or clear the file's FS_XFLAG_DAX flag as needed. Programs
must then cause the kernel to evict the inode from memory. This
can be done by:

i> Closing the file and re-opening the file and using statx to
see if the fs has changed the S_DAX flag; and

ii> If the file still does not have the desired S_DAX access
mode, either unmount and remount the filesystem, or close
the file and use drop_caches.

6. It's not unreasonable that users who want to squeeze every last bit
of performance out of the particular rough and tumble bits of their
storage also be exposed to the difficulties of what happens when the
operating system can't totally virtualize those hardware
capabilities. Your high performance sports car is not a Toyota
minivan, as it were.

Given our overnight discussions, I don't think it'll be difficult to
hoist XFS_IDONTCACHE to the VFS so that 5.c.i is enough to change the
S_DAX state if nobody else is using the file.

--D

> >
> > > Furthermore, if we did want an interface like that why not allow
> > > the on-disk flag to be set as well as cleared?
> >
> > Well, why not - it's why I implemented the flag in the first place!
> > The only problem we have here is how to safely change the in-memory
> > DAX state, and that largely has nothing to do with setting/clearing
> > the on-disk flag....
>
> With the above change to xfs_diflags_to_iflags() I think we are ok here.
>
> Ira
>
--D

> Ira
>
> >
> > I think (like Dave said) that if you set XFS_IDONTCACHE on the inode
> > when you change the DAX flag, the VFS will kill the inode the instant
> > the last user close()s the file. Then 5.c.ii will actually work.
> >
> > --D
> >
> > > >
> > > > > Furthermore, if we did want an interface like that why not allow
> > > > > the on-disk flag to be set as well as cleared?
> > > >
> > > > Well, why not - it's why I implemented the flag in the first place!
> > > > The only problem we have here is how to safely change the in-memory
> > > > DAX state, and that largely has nothing to do with setting/clearing
> > > > the on-disk flag....
> > >
> > > With the above change to xfs_diflags_to_iflags() I think we are ok here.
> > >
> > > Ira
> > >