Re: [PATCH v2 2/3] mm, dax: add VM_DAX flag for DAX VMAs
From: Dave Chinner
Date: Fri Sep 16 2016 - 01:36:51 EST
On Thu, Sep 15, 2016 at 07:04:27PM -0700, Dan Williams wrote:
> On Thu, Sep 15, 2016 at 6:24 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > On Thu, Sep 15, 2016 at 05:16:42PM -0700, Dan Williams wrote:
> >> On Thu, Sep 15, 2016 at 4:07 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> >> > On Thu, Sep 15, 2016 at 10:01:03AM -0700, Dan Williams wrote:
> >> >> On Thu, Sep 15, 2016 at 1:26 AM, Christoph Hellwig <hch@xxxxxx> wrote:
> >> >> > On Wed, Sep 14, 2016 at 11:54:38PM -0700, Dan Williams wrote:
> >> >> >> The DAX property, page cache bypass, of a VMA is only detectable via the
> >> >> >> vma_is_dax() helper to check the S_DAX inode flag. However, this is
> >> >> >> only available internal to the kernel and is a property that userspace
> >> >> >> applications would like to interrogate.
> >> >> >
> >> >> > They have absolutely no business knowing such an implementation detail.
> >> >>
> >> >> Hasn't that train already left the station with FS_XFLAG_DAX?
> >> >
> >> > No, that's an admin flag, not a runtime hint for applications. Just
> >> > because that flag is set on an inode, it does not mean that DAX is
> >> > actually in use - it will be ignored if the backing dev is not dax
> >> > capable.
> >>
> >> What's the point of an admin flag if an admin can't do cat /proc/<pid
> >> of interest>/smaps, or some other mechanism, to validate that the
> >> setting the admin cares about is in effect?
> >
> > Sorry, I don't follow - why would you be looking at mapping file
> > regions in /proc to determine if some file somewhere in a filesystem
> > has a specific flag set on it or not?
> >
> > FS_XFLAG_DAX is an inode attribute flag, not something you can
> > query or administrate through mmap:
> >
> > I.e.
> > # xfs_io -c "lsattr" -c "chattr +x" -c lsattr -c "chattr -x" -c "lsattr" foo
> > --------------- foo
> > --------------x foo
> > --------------- foo
> > #
> >
> > What happens when that flag is set on an inode is determined by a
> > whole bunch of other things that are completely separate to the
> > management of the inode flag itself.
>
> Right, I understand that, but how does an admin audit those "bunch of
> other things"
Filesystem mounts checks all the various stuff that determines
whether DAX can be used. It logs to the console that it is "Dax
capable". Any file that then has FS_XFLAG_DAX set will result in DAX
being used. There is no other possibility when these two things are
reported.
/me points at runtime diagnostic tracepoints like
trace_xfs_file_dax_read() and notes that dax is sadly lacking in
diagnostic tracepoints.
Besides, userspace can't do anything useful with this information,
because the FS_XFLAG_DAX can be changed /at any time/ by an admin.
And the filesystem is free to remove it at any time, too, if it
needs to (e.g. file gets reflinked or snapshotted).
That's right, an inode can dynamically change from DAX to non-DAX
underneath the application, and the application /will not notice/.
That's because changing the flag will sync and invalidate the
existing mappings and the next application access will simply fault
it back in using whatever mechanism the inode is now configured
with.
Plain and simple: userspace has absolutely no fucking idea of
whether DAX is enabled or not, and whatever the kernel returns to
userspace above the DAX configuration is stale before it even got
out of the kernel....
Cheers,
Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx