Re: [PATCH] dax, pmem: add support for msync

From: Ross Zwisler
Date: Wed Sep 02 2015 - 15:05:31 EST


On Tue, Sep 01, 2015 at 03:18:41PM +0300, Boaz Harrosh wrote:
> So the approach we took was a bit different to exactly solve these
> problem, and to also not over flush too much. here is what we did.
>
> * At vm_operations_struct we also override the .close vector (say call it dax_vm_close)
>
> * At dax_vm_close() on writable files call ->fsync(,vma->vm_start, vma->vm_end,)
> (We have an inode flag if the file was actually dirtied, but even if not, that will
> not be that bad, so a file was opened for write, mmapped, but actually never
> modified. Not a lot of these, and the do nothing cl_flushing is very fast)
>
> * At ->fsync() do the actual cl_flush for all cases but only iff
> if (mapping_mapped(inode->i_mapping) == 0)
> return 0;
>
> This is because data written not through mmap is already persistent and we
> do not need the cl_flushing
>
> Apps expect all these to work:
> 1. open mmap m-write msync ... close
> 2. open mmap m-write fsync ... close
> 3. open mmap m-write unmap ... fsync close
>
> 4. open mmap m-write sync ...

So basically you made close have an implicit fsync? What about the flow that
looks like this:

5. open mmap close m-write

This guy definitely needs an msync/fsync at the end to make sure that the
m-write becomes durable.

Also, the CLOSE(2) man page specifically says that a flush does not occur at
close:
A successful close does not guarantee that the data has been
successfully saved to disk, as the kernel defers writes. It
is not common for a filesystem to flush the buffers when the stream is
closed. If you need to be sure that the data is physically stored,
use fsync(2). (It will depend on the disk hardware at this point.)

I don't think that adding an implicit fsync to close is the right solution -
we just need to get msync and fsync correctly working.

> The first 3 are supported with above, because what happens is that at [3]
> the fsync actually happens on unmap and fsync is redundant in that case.
>
> The only broken scenario is [3]. We do not have a list of "dax-dirty" inodes
> per sb to iterate on and call inode-sync on. This cause problems mostly in
> freeze because with actual [3] scenario the file will be eventually closed
> and persistent, but after the call to sync returns.
>
> Its on my TODO to fix [3] based on instructions from Dave.
> The mmap call will put the inode on the list and the dax_vm_close will
> remove it. One of the regular dirty list should be used as suggested by
> Dave.

I believe in the above two paragraphs you meant [4], so the

4. open mmap m-write sync ...

case needs to be fixed so that we can detect DAX-dirty inodes?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/