Re: [RFC PATCH 2/2] mm, fs: daxfile, an interface for byte-addressable updates to pmem

From: Dan Williams
Date: Sun Jun 18 2017 - 21:52:03 EST


On Sun, Jun 18, 2017 at 1:18 AM, Christoph Hellwig <hch@xxxxxx> wrote:
> On Sat, Jun 17, 2017 at 08:15:05PM -0700, Dan Williams wrote:
>> The hang up is that it requires per-fs enabling as it needs to be
>> careful to manage mmap_sem vs fs journal locks for example. I know the
>> in-development NOVA [1] filesystem is planning to support this out of
>> the gate. ext4 would be open to implementing it, but I think xfs is
>> cold on the idea. Christoph originally proposed it here [2], before
>> Dave went on to propose immutable semantics.
>>
>> [1]: https://github.com/NVSL/NOVA
>> [2]: https://lists.01.org/pipermail/linux-nvdimm/2016-February/004609.html
>
> And I stand to that statement. Let's get DAX stable first, and
> properly cleaned up (e.g. follow on work with separating it entirely
> from the block device). Then think hard about how most of the
> persistent memory technologies actually work, including the point that
> for a lot of workloads page cache will be required at least on the
> write side. And then come up with actual real use cases and we can
> look into it.

I see it differently. We're already at a good point in time to start
iterating on a fix for this issue. Ross and Jan have done a lot of
good work on the dax stability front, and the block-device separation
of dax is well underway.

> And stop trying to shoe-horn crap like this in.

The kernel shoe-horning all pmem+filesystem-dax applications into
abiding page-cache semantics is a problem, and this RFC has already
helped move the needle on a couple fronts. 1/ Swapfiles are subtly
broken which is something worth fixing, and if it gets us a
synchronous-dax mode without major filesystem surgery then that's all
for the better. 2/ There's an appetite for just fixing this
incrementally in each filesystem's fault handler, so if ext4 was able
to prove out an interface / implementation for synchronous faults we
could go with that instead of a pre-allocated + immutable interface
and let other filesystems set their own timelines.