Re: [PATCH 0/5] Enable per-file/directory DAX operations

From: Boaz Harrosh
Date: Thu Oct 24 2019 - 10:05:53 EST


On 24/10/2019 10:34, Dave Chinner wrote:
> On Thu, Oct 24, 2019 at 05:31:13AM +0300, Boaz Harrosh wrote:
<>
>
> The on disk DAX flag is inherited from the parent directory at
> create time. Hence an admin only need to set it on the data
> directory of the application when first configuring it, and
> everything the app creates will be configured for DAX access
> automatically.
>

Yes I said that as well. But again I am concerned that this is the
opposite of our Intention. As you said the WRITEs are slow and
do not scale so what we like, and why we have the all problem, is
to WRITE *none*-DAX. And if so then how do we turn the bit ON later
for the fast READs.

> Or, alternatively, mkfs sets the flag on the root dir so that
> everything in the filesystem uses DAX by default (through
> inheritance) unless the admin turns off the flag on a directory
> before it starts to be used

> or on a set of files after they have
> been created (because DAX causes problems)...
>

Yes exactly this can not be done currently.

> So, yeah, there's another problem with the basic assertion that we
> only need to allow the on disk flag to be changed on zero length
> files: we actually want to be able to -clear- the DAX flag when the
> file has data attached to it, not just when it is an empty file...
>

Exactly, This is my concern. And the case that I see most useful is the
opposite where I want to turn it ON, for DAX fast READs.

>> What if, say in XFS when setting the DAX-bit we take all the three write-locks
>> same as a truncate. Then we check that there are no active page-cache mappings
>> ie. a single opener. Then allow to set the bit. Else return EBUISY. (file is in use)
>
> DAX doesn't have page cache mappings, so anything that relies on
> checking page cache state isn't going to work reliably.

I meant on the opposite case, Where the flag was OFF and I want it ON for
fast READs. In that case if I have any users there are pages on the
xarray.
BTW the opposite is also true if we have active DAX users we will have
DAX entries in the xarray. What we want is that there are *no* active
users while we change the file-DAX-mode. Else we fail the change.

> I also seem
> to recall that there was a need to take some vm level lock to really
> prevent page fault races, and that we can't safely take that in a
> safe combination with all the filesystem locks we need.
>

We do not really care with page fault races in the Kernel as long
as I protect the xarray access and these are protected well if we
take truncate locking. But we have a bigger problem that you pointed
out with the change of the operations vector pointer.

I was thinking about this last night. One way to do this is with
file-exclusive-lock. Correct me if I'm wrong:
file-exclusive-readwrite-lock means any other openers will fail and
if there are openers already the lock will fail. Which is what we want
no? to make sure we are the exclusive user of the file while we change
the op vector.
Now the question is if we force the application to take the lock and
Kernel only check that we are locked. Or Kernel take the lock within
the IOCTL.

Lets touch base. As I understand the protocol we want to establish with
the administration tool is:

- File is created, written. read...

- ALL file handles are closed, there are no active users
- File open by single opener for the purpose of changing the DAX-mode
- lock from all other opens
- change the DAX-mode, op vectors
- unlock-exlusivness

- File activity can resume...

That's easy to say, But how can we enforce this protocol?

> Cheers,
> Dave.
>

Thanks Dave
Boaz