Re: [RFC PATCH 2/2] mm, fs: daxfile, an interface for byte-addressable updates to pmem

From: Andy Lutomirski
Date: Thu Jun 22 2017 - 00:08:48 EST

On Wed, Jun 21, 2017 at 5:02 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> You seem to be calling the "fdatasync on every page fault" the

It's the opposite of fdatasync(). It needs to sync whatever metadata
is needed to find the data. The data doesn't need to be synced.

> "lightweight" option. That's the brute-force-with-big-hammer
> solution - it's most definitely not lightweight as every page fault
> has extra overhead to call ->fsync(). Sure, the API is simple, but
> the runtime overhead is significant.

It's lightweight in terms of its impact on the filesystem. It doesn't
need any persistent setup -- you can just use it.

> Even if you are considering the complexity of the APIs, it's hardly
> a "heavyweight" when it only requires a single call to fallocate()
> before mmap() to set up the immutable extents on the file...

So what would the exact semantics be? In particular, how can it fail?
If I do the fallocate(), is it absolutely promised that the extent
map won't get out of sync between what mmap sees and what's on disk?
Do user programs need to worry about colliding with each other when
one does fallocate() to DAXify a file and the other does fallocate()
to unDAXify a file? Does this particular fallocate() call still keep
its effect after a reboot?

These issues are why I think it would be nicer to have an API that
makes a particular mapping or fd be unconditionally *correct* and then
to provide something else that makes it avoid latency spikes.

Is there an actual concrete proposal that's reviewable?