Re: [PATCH V5 00/12] Enable per-file/per-directory DAX operations V5
From: Darrick J. Wong
Date: Fri Apr 03 2020 - 14:40:07 EST
On Fri, Apr 03, 2020 at 11:18:43AM -0700, Ira Weiny wrote:
> On Fri, Apr 03, 2020 at 07:03:38PM +0200, Jan Kara wrote:
> > On Fri 03-04-20 08:48:29, Ira Weiny wrote:
> > > On Fri, Apr 03, 2020 at 09:27:31AM +0200, Christoph Hellwig wrote:
> > > > On Thu, Apr 02, 2020 at 01:55:19PM -0700, Ira Weiny wrote:
> > > > > > I'd just return an error for that case, don't play silly games like
> > > > > > evicting the inode.
> > > > >
> > > > > I think I agree with Christoph here. But I want to clarify. I was heading in
> > > > > a direction of failing the ioctl completely. But we could have the flag change
> > > > > with an appropriate error which could let the user know the change has been
> > > > > delayed.
> > > > >
> > > > > But I don't immediately see what error code is appropriate for such an
> > > > > indication. Candidates I can envision:
> > > > >
> > > > > EAGAIN
> > > > > ERESTART
> > > > > EUSERS
> > > > > EINPROGRESS
> > > > >
> > > > > None are perfect but I'm leaning toward EINPROGRESS.
> > > >
> > > > I really, really dislike that idea. The whole point of not forcing
> > > > evictions is to make it clear - no this inode is "busy" you can't
> > > > do that. A reasonably smart application can try to evict itself.
> > >
> > > I don't understand. What Darrick proposed would never need any
> > > evictions. If the file has blocks allocated the FS_XFLAG_DAX flag can
> > > not be changed. So I don't see what good eviction would do at all.
> >
> > I guess there's some confusion here (may well be than on my side). Darrick
> > propose that we can switch FS_XFLAG_DAX only when file has no blocks
> > allocated - fine by me. But that still does not mean than we can switch
> > S_DAX immediately, does it? Because that would still mean we need to switch
> > aops on living inode and that's ... difficult and Christoph didn't want to
> > clutter the code with it.
> >
> > So I've understood Darrick's proposal as: Just switch FS_XFLAG_DAX flag,
> > S_DAX flag will magically switch when inode gets evicted and the inode gets
> > reloaded from the disk again. Did I misunderstand anything?
> >
> > And my thinking was that this is surprising behavior for the user and so it
> > will likely generate lots of bug reports along the lines of "DAX inode flag
> > does not work!". So I was pondering how to make the behavior less
> > confusing... The ioctl I've suggested was just a poor attempt at that.
>
> Ok but then I don't understand Christophs comment to "just return an error for
> that case"? Which case?
>
> >
> > > > But returning an error and doing a lazy change anyway is straight from
> > > > the playbook for arcane and confusing API designs.
> > >
> > > Jan countered with a proposal that the FS_XFLAG_DAX does change with
> > > blocks allocated. But that S_DAX would change on eviction. Adding that
> > > some eviction ioctl could be added.
> >
> > No, I didn't mean that we can change FS_XFLAG_DAX with blocks allocated. I
> > was still speaking about the case without blocks allocated.
>
> Ah ok good point. But again what 'error' do we return when FS_XFLAG_DAX
> changed but S_DAX did not?
>
> >
> > > You then proposed just returning an error for that case. (This lead me to
> > > believe that you were ok with an eviction based change of S_DAX.)
> > >
> > > So I agreed that changing S_DAX could be delayed until an explicit eviction.
> > > But, to aid the 'smart application', a different error code could be used to
> > > indicate that the FS_XFLAG_DAX had been changed but that until that explicit
> > > eviction occurs S_DAX would remain.
> > >
> > > So I don't fully follow what you mean by 'lazy change'?
> > >
> > > Do you still really, really dislike an explicit eviction method for changing
> > > the S_DAX flag?
> > >
> > > If FS_XFLAG_DAX can never be changed on a file with blocks allocated and the
> > > user wants to change the mode of operations on their 'data'; they would have to
> > > create a new file with the proper setting and move the data there. For example
> > > copy the file into a directory marked FS_XFLAG_DAX==true?
> > >
> > > I'm ok with either interface as I think both could be clear if documented.
> >
> > I agree that what Darrick suggested is technically easily doable and can be
> > documented. But it is not natural behavior (i.e., different than all inode
> > flags we have) and we know how careful people are when reading
> > documentation...
> >
>
> Ok For 5.8 why don't we not allow FS_XFLAG_DAX to be changed on files _at_
> _all_...
>
> In summary:
>
> - Applications must call statx to discover the current S_DAX state.
Ok.
> - There exists an advisory file inode flag FS_XFLAG_DAX that is set based on
> the parent directory FS_XFLAG_DAX inode flag. (There is no way to change
> this flag after file creation.)
>
> If FS_XFLAG_DAX is set and the fs is on pmem then it will enable S_DAX at
> inode load time; if FS_XFLAG_DAX is not set, it will not enable S_DAX.
> Unless overridden...
Ok, fine with me. :)
> - There exists a dax= mount option.
>
> "-o dax=off" means "never set S_DAX, ignore FS_XFLAG_DAX"
> "-o nodax" means "dax=off"
I surveyed the three fses that support dax and found that none of the
filesystems actually have a 'nodax' flag. Now would be the time not to
add such a thing, and make people specify dax=off instead. It would
be handy if we could have a single fsparam_enum for figuring out the dax
mount options.
> "-o dax=always" means "always set S_DAX (at least on pmem), ignore FS_XFLAG_DAX"
> "-o dax" by itself means "dax=always"
> "-o dax=iflag" means "follow FS_XFLAG_DAX" and is the default
>
> - There exists an advisory directory inode flag FS_XFLAG_DAX that can be
> changed at any time. The flag state is copied into any files or
> subdirectories when they are created within that directory. If programs
> require file access runs in S_DAX mode, they'll have to create those files
"...they must create..."
> inside a directory with FS_XFLAG_DAX set, or mount the fs with an
> appropriate dax mount option.
Otherwise seems ok to me.
--D
>
>
> ???
>
> Ira
>