Re: [RFC PATCH 2/2] mm, fs: daxfile, an interface for byte-addressable updates to pmem
From: Darrick J. Wong
Date: Tue Jun 20 2017 - 21:24:39 EST
On Wed, Jun 21, 2017 at 09:53:46AM +1000, Dave Chinner wrote:
> On Tue, Jun 20, 2017 at 09:17:36AM -0700, Dan Williams wrote:
> > On Tue, Jun 20, 2017 at 1:49 AM, Christoph Hellwig <hch@xxxxxx> wrote:
> > > [stripped giant fullquotes]
> > >
> > > On Mon, Jun 19, 2017 at 10:53:12PM -0700, Andy Lutomirski wrote:
> > >> But that's my whole point. The kernel doesn't really need to prevent
> > >> all these background maintenance operations -- it just needs to block
> > >> .page_mkwrite until they are synced. I think that whatever new
> > >> mechanism we add for this should be sticky, but I see no reason why
> > >> the filesystem should have to block reflink on a DAX file entirely.
> > >
> > > Agreed - IFF we want to support write through semantics this is the
> > > only somewhat feasible way. It still has massive downsides of forcing
> > > the full sync machinery to run from the page fauly handler, which
> > > I'm rather scared off, but that's still better than creating a magic
> > > special case that isn't managable at all.
> > An immutable-extent DAX-file and a reflink-capable DAX-file are not
> > mutually exclusive,
> Actually, they are mutually exclusive: when the immutable extent DAX
> inode is breaking the extent sharing done during the reflink
> operation, the copy-on-write operation requires allocating and
> freeing extents on the inode that has immutable extents. Which, if
> the inode really has immutable extents, cannot be done.
> That said, if the extent sharing is broken on the other side of the
> reflink (i.e. the non-immutable inode created by the reflink) then
> the extent map of the inode with immutable extents will remain
> unchanged. i.e. there are two sides to this, and if you only see one
> side you might come to the wrong conclusion.
> However, we cannot guarantee that no writes occur to the inode with
> immutable extent maps (especially as the whole point is to allow
> userspace writes and commits without the kernel being involved), so
> extent sharing on immutable extent maps cannot be allowed...
Just to play devil's advocate...
/If/ you have rmap and /if/ you discover that there's only one
IOMAP_IMMUTABLE file owning this same block and /if/ you're willing to
relocate every other mapping on the whole filesystem, /then/ you could
/in theory/ support shared daxfiles.
However, that's so many on-disk metadata lookups to shove into a
pagefault handler that I don't think anyone in XFSland would entertain
such an ugly fantasy. You'd be making a lot of metadata requests, and
you'd have to lock the rmapbt while grabbing inodes, which is insane.
Much easier to have a per-inode flag that says "the block map of this
file does not change" and put up with the restricted semantics.
> Dave Chinner