Re: [PATCH v2 2/3] mm, dax: add VM_DAX flag for DAX VMAs

From: Dave Chinner
Date: Thu Sep 15 2016 - 19:08:11 EST


On Thu, Sep 15, 2016 at 10:01:03AM -0700, Dan Williams wrote:
> On Thu, Sep 15, 2016 at 1:26 AM, Christoph Hellwig <hch@xxxxxx> wrote:
> > On Wed, Sep 14, 2016 at 11:54:38PM -0700, Dan Williams wrote:
> >> The DAX property, page cache bypass, of a VMA is only detectable via the
> >> vma_is_dax() helper to check the S_DAX inode flag. However, this is
> >> only available internal to the kernel and is a property that userspace
> >> applications would like to interrogate.
> >
> > They have absolutely no business knowing such an implementation detail.
>
> Hasn't that train already left the station with FS_XFLAG_DAX?

No, that's an admin flag, not a runtime hint for applications. Just
because that flag is set on an inode, it does not mean that DAX is
actually in use - it will be ignored if the backing dev is not dax
capable.

> The other problem with hiding the DAX property is that it turns out to
> not be a transparent acceleration feature. See xfs/086 xfs/088
> xfs/089 xfs/091 which fail with DAX and, as far as I understand, it is
> due to the fact that DAX disallows delayed allocation behavior.

Which is not a bug, nor is it something that app developers should
be surprised by.

i.e. Subtle differences in error reporting behaviour occur in
filesystems /all the time/. Run the test on a non-dax filesystem
with an extent size hint. It fails /exactly the same way as DAX/.
Run it with direct IO - fails the same way as DAX. Run it
with synchronous writes - it fails the same way as DAX.

IOWs, if an app can't handle the way DAX reports errors, then they
are /broken/. Delayed allocation requires checking the return value
of fsync() or close() to capture the allocation error - many more
apps get that wrong than the ones that expect the immediate errors
from write()...

Anyway: to domeonstrate that the nothign is actually broken, and
you might sometimes need to fix tests and send patches to
fstests@xxxxxxxxxxxxxxx, this makes xfs/086 pass for me on DAX:

--- a/tests/xfs/086
+++ b/tests/xfs/086
@@ -96,7 +96,8 @@ _scratch_mount

echo "+ modify files"
for x in `seq 1 64`; do
- $XFS_IO_PROG -f -c "pwrite -S 0x62 0 ${blksz}" "${TESTFILE}.${x}" >> $seqres.full
+ $XFS_IO_PROG -f -c "pwrite -S 0x62 0 ${blksz}" "${TESTFILE}.${x}" \
+ >> $seqres.full 2>&1
done
umount "${SCRATCH_MNT}"

Cheers,

Dave.

--
Dave Chinner
david@xxxxxxxxxxxxx