Re: [PATCH 00/13] dax, pmem: move cpu cache maintenance to libnvdimm
From: Christoph Hellwig
Date: Mon Jan 23 2017 - 10:58:23 EST
On Mon, Jan 23, 2017 at 06:37:18AM +0000, Matthew Wilcox wrote:
> Wow, DAX devices look painful and awful. I certainly don't want to be
> exposing the memory fronted by my network filesystem to userspace to
> access. That just seems like a world of pain and bad experiences.
So what is your interest in using DAX for your file system then instead
of a private mechanisms?
> Absolutely the filesystem (or perhaps better, the ACPI tables) need to
> mark that chunk of memory as reserved, but it's definitely not available
> for anyone to access without the filesystem being aware.
That does sounds like a massive special case all over the stack.
But until we see it I think we should simply ignore this case and
concentrate on what we have right now.
> Even if we let the filesystem create a DAX device that doesn't show
> up in /dev (for example), Dan's patches don't give us a way to go
> from a file on the filesystem to a set of dax_ops.
Which doesn't make sense any way. The entry points into the file system
are read + write and mmap, and the file system might then use libraries
to implement different types of I/O, such as the page cache or DAX.
> And it does need to be a per-file operation, eg to support a file on
> an XFS volume which might be on a RT device or a normal device.
> That was why I leaned towards an address_space operation, but I'd be
> happy to see an inode_operation instead.
Again, no. The layers above the file system have absolutely no business
to even know if we're using DAX or pagecache access, nevermind how
in detail they are used. Assuming you want to use DAX-like semantics
it's up to the lower level to expose the correct operations for
a given memory region. Right now these would just be intel nfit or
legacy 820 + ADR for regions marked such in the memory map. If say
a hypervisor wants to expose a region that needs a special flush
call or even has requirements on the type of memcpy it needs to
provide operations for this memory region. The user of this region
(DAX-native file system pmem driver or device DAX) then needs
to use these methods.
And those pretty much are the methods Dan proposes here - it's
just that we should not tie them to block device operations, at
least not in the long run.