[RFC PATCH V2 00/12] Enable per-file/directory DAX operations V2
From: ira . weiny
Date: Fri Jan 10 2020 - 14:30:00 EST
From: Ira Weiny <ira.weiny@xxxxxxxxx>
At LSF/MM'19 [1] [2] we discussed applications that overestimate memory
consumption due to their inability to detect whether the kernel will
instantiate page cache for a file, and cases where a global dax enable via a
mount option is too coarse.
The following patch series enables selecting the use of DAX on individual files
and/or directories on xfs, and lays some groundwork to do so in ext4. In this
scheme the dax mount option can be omitted to allow the per-file property to
take effect.
The insight at LSF/MM was to separate the per-mount or per-file "physical"
capability switch from an "effective" attribute for the file.
At LSF/MM we discussed the difficulties of switching the mode of a file with
active mappings / page cache. It was thought the races could be avoided by
limiting mode flips to 0-length files.
However, this turns out to not be true.[3] This is because address space
operations (a_ops) may be in use at any time the inode is referenced and users
have expressed a desire to be able to change the mode on a file with data in
it. For those reasons this patch set allows changing the mode flag on a file
as long as it is not current mapped.
Furthermore, DAX is a property of the inode and as such, many operations other
than address space operations need to be protected during a mode change.
Therefore callbacks are placed within the inode operations and used to lock the
inode as appropriate.
As in V1, Users are able to query the effective and physical flags separately
at any time. Specifically the addition of the statx attribute bit allows them
to ensure the file is operating in the mode they intend. This 'effective flag'
and physical flags could differ when the filesystem is mounted with the dax
flag for example.
It should be noted that the physical DAX flag inheritance is not shown in this
patch set as it was maintained from previous work on XFS. The physical DAX flag
and it's inheritance will need to be added to other file systems for user
control.
Finally, extensive testing was performed which resulted in a couple of bug fix
and clean up patches. Specifically:
fs: remove unneeded IS_DAX() check
fs/xfs: Fix truncate up
'Fix truncate up' deserves specific attention because I'm not 100% sure it is
the correct fix. Without that patch fsx testing failed within a few minutes
with this error.
Mapped Write: non-zero data past EOF (0x3b0da) page offset 0xdb is 0x3711
With 'Fix truncate up' running fsx while changing modes can run for hours but I
have seen 2 other errors in the same genre after many hours of continuous
testing.
They are:
READ BAD DATA: offset = 0x22dc, size = 0xcc7e, fname = /mnt/pmem/dax-file
Mapped Read: non-zero data past EOF (0x3309e) page offset 0x9f is 0x6ab4
After seeing the patches to fix stale data exposure problems[4] I'm more
confident now that all 3 of these errors are a latent bug rather than a bug in
this series itself.
However, because of these failures I'm only submitting this set RFC.
[1] https://lwn.net/Articles/787973/
[2] https://lwn.net/Articles/787233/
[3] https://lkml.org/lkml/2019/10/20/96
[4] https://patchwork.kernel.org/patch/11310511/
To: linux-kernel@xxxxxxxxxxxxxxx
Cc: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>
Cc: "Darrick J. Wong" <darrick.wong@xxxxxxxxxx>
Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
Cc: Dave Chinner <david@xxxxxxxxxxxxx>
Cc: Christoph Hellwig <hch@xxxxxx>
Cc: "Theodore Y. Ts'o" <tytso@xxxxxxx>
Cc: Jan Kara <jack@xxxxxxx>
Cc: linux-ext4@xxxxxxxxxxxxxxx
Cc: linux-xfs@xxxxxxxxxxxxxxx
Cc: linux-fsdevel@xxxxxxxxxxxxxxx