Re: [PATCH] dax, pmem: add support for msync

From: Dave Chinner
Date: Wed Sep 02 2015 - 20:57:45 EST


On Wed, Sep 02, 2015 at 12:13:21PM +0300, Kirill A. Shutemov wrote:
> On Wed, Sep 02, 2015 at 08:49:22AM +1000, Dave Chinner wrote:
> > On Tue, Sep 01, 2015 at 01:08:04PM +0300, Kirill A. Shutemov wrote:
> > > On Tue, Sep 01, 2015 at 09:38:03AM +1000, Dave Chinner wrote:
> > > > On Mon, Aug 31, 2015 at 12:59:44PM -0600, Ross Zwisler wrote:
> > > > Even for DAX, msync has to call vfs_fsync_range() for the filesystem to commit
> > > > the backing store allocations to stable storage, so there's not
> > > > getting around the fact msync is the wrong place to be flushing
> > > > DAX mappings to persistent storage.
> > >
> > > Why?
> > > IIUC, msync() doesn't have any requirements wrt metadata, right?
> >
> > Of course it does. If the backing store allocation has not been
> > committed, then after a crash there will be a hole in file and
> > so it will read as zeroes regardless of what data was written and
> > flushed.
>
> Any reason why backing store allocation cannot be committed on *_mkwrite?

Oh, I could change that if you want, it'll just be ridiculously
slow because it requires journal flushes on every page fault that
needs to change the filesytsem block map (i.e. every allocation and/or
every unwritten extent conversion).

Sycnhronous journalling requires flushing the log on every
transaction commit. That involves switching to a work queue, copying
the changes into a log buffer, issuing IO to flush the journal,
waiting for that to complete, etc. i.e. synchronous journalling
incurs a minimum overhead of 4 context switches per page fault that
needs to allocate/convert backing store, along with all the CPU time
needed to process the journal commit.

> diff --git a/mm/mmap.c b/mm/mmap.c
> index 3f78bceefe5a..f2e29a541e14 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1645,6 +1645,15 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> vma->vm_ops = &dummy_ops;
> }
>
> + /*
> + * Make sure that for VM_MIXEDMAP VMA has both
> + * vm_ops->page_mkwrite and vm_ops->pfn_mkwrite or has none.
> + */
> + if ((vma->vm_ops->page_mkwrite || vma->vm_ops->pfn_mkwrite) &&
> + vma->vm_flags & VM_MIXEDMAP) {
> + VM_BUG_ON_VMA(!vma->vm_ops->page_mkwrite, vma);
> + VM_BUG_ON_VMA(!vma->vm_ops->pfn_mkwrite, vma);
> + }

Doesn't really help developers that don't use CONFIG_DEBUG_VM. i.e
it's the FS developers that you need to warn, not VM developers -
in this case a "WARN_ON_ONCE" is probably more appropriate.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/