Re: [PATCH V5 00/12] Enable per-file/per-directory DAX operations V5

From: Ira Weiny
Date: Fri Apr 03 2020 - 14:18:47 EST


On Fri, Apr 03, 2020 at 07:03:38PM +0200, Jan Kara wrote:
> On Fri 03-04-20 08:48:29, Ira Weiny wrote:
> > On Fri, Apr 03, 2020 at 09:27:31AM +0200, Christoph Hellwig wrote:
> > > On Thu, Apr 02, 2020 at 01:55:19PM -0700, Ira Weiny wrote:
> > > > > I'd just return an error for that case, don't play silly games like
> > > > > evicting the inode.
> > > >
> > > > I think I agree with Christoph here. But I want to clarify. I was heading in
> > > > a direction of failing the ioctl completely. But we could have the flag change
> > > > with an appropriate error which could let the user know the change has been
> > > > delayed.
> > > >
> > > > But I don't immediately see what error code is appropriate for such an
> > > > indication. Candidates I can envision:
> > > >
> > > > EAGAIN
> > > > ERESTART
> > > > EUSERS
> > > > EINPROGRESS
> > > >
> > > > None are perfect but I'm leaning toward EINPROGRESS.
> > >
> > > I really, really dislike that idea. The whole point of not forcing
> > > evictions is to make it clear - no this inode is "busy" you can't
> > > do that. A reasonably smart application can try to evict itself.
> >
> > I don't understand. What Darrick proposed would never need any
> > evictions. If the file has blocks allocated the FS_XFLAG_DAX flag can
> > not be changed. So I don't see what good eviction would do at all.
>
> I guess there's some confusion here (may well be than on my side). Darrick
> propose that we can switch FS_XFLAG_DAX only when file has no blocks
> allocated - fine by me. But that still does not mean than we can switch
> S_DAX immediately, does it? Because that would still mean we need to switch
> aops on living inode and that's ... difficult and Christoph didn't want to
> clutter the code with it.
>
> So I've understood Darrick's proposal as: Just switch FS_XFLAG_DAX flag,
> S_DAX flag will magically switch when inode gets evicted and the inode gets
> reloaded from the disk again. Did I misunderstand anything?
>
> And my thinking was that this is surprising behavior for the user and so it
> will likely generate lots of bug reports along the lines of "DAX inode flag
> does not work!". So I was pondering how to make the behavior less
> confusing... The ioctl I've suggested was just a poor attempt at that.

Ok but then I don't understand Christophs comment to "just return an error for
that case"? Which case?

>
> > > But returning an error and doing a lazy change anyway is straight from
> > > the playbook for arcane and confusing API designs.
> >
> > Jan countered with a proposal that the FS_XFLAG_DAX does change with
> > blocks allocated. But that S_DAX would change on eviction. Adding that
> > some eviction ioctl could be added.
>
> No, I didn't mean that we can change FS_XFLAG_DAX with blocks allocated. I
> was still speaking about the case without blocks allocated.

Ah ok good point. But again what 'error' do we return when FS_XFLAG_DAX
changed but S_DAX did not?

>
> > You then proposed just returning an error for that case. (This lead me to
> > believe that you were ok with an eviction based change of S_DAX.)
> >
> > So I agreed that changing S_DAX could be delayed until an explicit eviction.
> > But, to aid the 'smart application', a different error code could be used to
> > indicate that the FS_XFLAG_DAX had been changed but that until that explicit
> > eviction occurs S_DAX would remain.
> >
> > So I don't fully follow what you mean by 'lazy change'?
> >
> > Do you still really, really dislike an explicit eviction method for changing
> > the S_DAX flag?
> >
> > If FS_XFLAG_DAX can never be changed on a file with blocks allocated and the
> > user wants to change the mode of operations on their 'data'; they would have to
> > create a new file with the proper setting and move the data there. For example
> > copy the file into a directory marked FS_XFLAG_DAX==true?
> >
> > I'm ok with either interface as I think both could be clear if documented.
>
> I agree that what Darrick suggested is technically easily doable and can be
> documented. But it is not natural behavior (i.e., different than all inode
> flags we have) and we know how careful people are when reading
> documentation...
>

Ok For 5.8 why don't we not allow FS_XFLAG_DAX to be changed on files _at_
_all_...

In summary:

- Applications must call statx to discover the current S_DAX state.

- There exists an advisory file inode flag FS_XFLAG_DAX that is set based on
the parent directory FS_XFLAG_DAX inode flag. (There is no way to change
this flag after file creation.)

If FS_XFLAG_DAX is set and the fs is on pmem then it will enable S_DAX at
inode load time; if FS_XFLAG_DAX is not set, it will not enable S_DAX.
Unless overridden...

- There exists a dax= mount option.

"-o dax=off" means "never set S_DAX, ignore FS_XFLAG_DAX"
"-o nodax" means "dax=off"
"-o dax=always" means "always set S_DAX (at least on pmem), ignore FS_XFLAG_DAX"
"-o dax" by itself means "dax=always"
"-o dax=iflag" means "follow FS_XFLAG_DAX" and is the default

- There exists an advisory directory inode flag FS_XFLAG_DAX that can be
changed at any time. The flag state is copied into any files or
subdirectories when they are created within that directory. If programs
require file access runs in S_DAX mode, they'll have to create those files
inside a directory with FS_XFLAG_DAX set, or mount the fs with an
appropriate dax mount option.


???

Ira