Re: [PATCH 1/7] xfs: always use DAX if mount option is used

From: Ross Zwisler
Date: Wed Sep 27 2017 - 12:15:46 EST


On Tue, Sep 26, 2017 at 11:40:01PM -0700, Christoph Hellwig wrote:
> On Tue, Sep 26, 2017 at 11:30:57AM -0600, Ross Zwisler wrote:
> > I agree that Christoph's idea about having the system intelligently adjust to
> > use DAX based on performance information it gathers about the underlying
> > persistent memory (probably via the HMAT on x86_64 systems) is interesting,
> > but I think we're still a ways away from that.
>
> So what are the missing blockers for a getting started?

Well, I don't know if platforms that support HMAT + PMEM are widely available,
but we have all the details in the ACPI spec, so we could begin to code it up
and things will "just work" when platforms arrive.

> > FWIW, as my patches suggest and Jan observed I think that we should allow
> > users to turn on DAX by treating the inode flag and the mount flag as an 'or'
> > operation. i.e. you get DAX if either the mount option is specified or if the
> > inode flag is set, and you can continue to manipulate the per-inode flag as
> > you want regardless of the mount option. I think this provides maximum
> > flexibility of the mechanism to select DAX without enforcing policy.
>
> IFF we stick to the dax flag that's the only workable way. The only
> major issue I still see with that is that this allows unprivilegued
> users to enable DAX on a any file they own / have write access to.
> So there isn't really any way to effectively disable the DAX path
> by the sysadmin.

Hum, I wonder if maybe we need/want three different mount modes? What about:

autodax (the default): the filesystem is free to use DAX or not, as it sees
fit and thinks is optimal. For the time being we can make this mean "don't
use DAX", and phase in DAX usage as we add support for the HMAT, etc.

Users can manually turn on DAX for a given inode by setting the DAX inode
flag, but there is no way for the user to *prevent* DAX for an inode - the
kernel can always choose to turn it on.

MAP_DIRECT and MAP_SYNC work.

nodax: Don't use DAX. The kernel won't choose to use DAX, and any DAX inode
flags will be ignored. This gives the sysadmin the override that I think
you're looking for. The user can still manipulate the inode flags as they see
fit.

MAP_DIRECT and MAP_SYNC both fail.

dax: Use DAX for all inodes in the filesystem. Again the inode flags are
essentially ignored, but the user can manipulate the inode flags as they see
fit. This is basically unchanged from how it works today, modulo the bug
where DAX can get turned off if you unset the inode flag where it wasn't even
set (patch 1 in my series).

MAP_DIRECT and MAP_SYNC work.

> > Does it make sense at this point to just start a "dax" man page that can
> > contain info about the mount options, inode flags, kernel config options, how
> > to get PMDs, etc? Or does this documentation need to be sprinkled around more
> > in existing man pages?
>
> A dax manpage would be good.

Okay, I'll start with a manpage, and once we agree on whats in there we can
start working on code again. :)