Re: [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
From: Ira Weiny
Date: Wed Apr 08 2020 - 18:11:02 EST
On Wed, Apr 08, 2020 at 02:28:30PM -0700, Dan Williams wrote:
> On Wed, Apr 8, 2020 at 2:02 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> >
[snip]
> > >
> > > void
> > > xfs_diflags_to_iflags(
> > > struct xfs_inode *ip,
> > > bool init)
> > > {
> > > struct inode *inode = VFS_I(ip);
> > > unsigned int xflags = xfs_ip2xflags(ip);
> > > unsigned int flags = 0;
> > >
> > > inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC | S_NOATIME |
> > > S_DAX);
> >
> > We don't want to clear the dax flag here, ever, if it is already
> > set. That is an externally visible change and opens us up (again) to
> > races where IS_DAX() changes half way through a fault path. IOWs, avoiding
> > clearing the DAX flag was something I did explicitly in the above
> > code fragment.
> >
> > And it makes the logic clearer by pre-calculating the new flags,
> > then clearing and setting the inode flags together, rather than
> > having the spearated at the top and bottom of the function.
> >
> > THis leads to an obvious conclusion: if we never clear the in memory
> > S_DAX flag, we can actually clear the on-disk flag safely, so that
> > next time the inode cycles into memory it won't be using DAX. IOWs,
> > admins can stop the applications, clear the DAX flag and drop
> > caches. This should result in the inode being recycled and when the
> > app is restarted it will run without DAX. No ned for deleting files,
> > copying large data sets, etc just to turn off an inode flag.
>
> Makes sense, but is that sufficient? I recall you saying there might
> be a multitude of other reasons that the inode is not evicted, not the
> least of which is races [1]. Does this need another flag, lets call it
> "dax toggle" to track the "I requested the inode to clear the flag,
> but on cache-flush + restart the inode never got evicted" case. S_DAX
> almost plays this role, but it loses the ability to audit which files
> are pending an inode eviction event. So the dax-toggle flag indicates
> to the kernel to xor the toggle value with the inode flag on inode
> instantiation and the dax inode flag is never directly manipulated by
> the ioctl path.
>
> [1]: http://lore.kernel.org/r/20191025003603.GE4614@xxxxxxxxxxxxxxxxxxx
FWIW I think we should continue down this simplified interface and get this
done for 5.8. If we can come up with a way for delayed mode change I'm all for
looking into that. But there has been too much controversy/difficulty about
changing the bit on a file.
So let's table this idea until >= 5.9
Ira