Re: [PATCH 3/3] dax: Handle write faults more efficiently

From: Andy Lutomirski
Date: Wed Jan 27 2016 - 01:02:24 EST


On Tue, Jan 26, 2016 at 8:17 PM, Matthew Wilcox <willy@xxxxxxxxxxxxxxx> wrote:
> On Mon, Jan 25, 2016 at 09:38:19AM -0800, Andy Lutomirski wrote:
>> On Mon, Jan 25, 2016 at 9:25 AM, Matthew Wilcox
>> <matthew.r.wilcox@xxxxxxxxx> wrote:
>> > From: Matthew Wilcox <willy@xxxxxxxxxxxxxxx>
>> >
>> > When we handle a write-fault on a DAX mapping, we currently insert a
>> > read-only mapping and then take the page fault again to convert it to
>> > a writable mapping. This is necessary for the case where we cover a
>> > hole with a read-only zero page, but when we have a data block already
>> > allocated, it is inefficient.
>> >
>> > Use the recently added vmf_insert_pfn_prot() to insert a writable mapping,
>> > even though the default VM flags say to use a read-only mapping.
>>
>> Conceptually, I like this. Do you need to make sure to do all the
>> do_wp_page work, though? (E.g. we currently update mtime in there.
>> Some day I'll fix that, but it'll be replaced with a set_bit to force
>> a deferred mtime update.)
>
> We update mtime in the ->fault handler of filesystems which support DAX
> like this:
>
> if (vmf->flags & FAULT_FLAG_WRITE) {
> sb_start_pagefault(inode->i_sb);
> file_update_time(vma->vm_file);
> }
>
> so I think we're covered.

A question that came up on IRC: if the page is a reflinked page on XFS
(whenever that feature lands), then presumably XFS has real work to do
in page_mkwrite. If so, what ensures that page_mkwrite gets called?

As a half-baked alternative to this patch, there's a generic
optimization for this case. do_shared_fault normally calls
do_page_mkwrite and installs the resulting page with the writable bit
set. But if __do_fault returns VM_FAULT_NOPAGE, then this
optimization is skipped. Could be add VM_FAULT_NOPAGE_READONLY (or
VM_FAULT_NOPAGE | VM_FAULT_READONLY) as a hint that a page was
installed but that it was installed readonly? If we did that, then
do_shared_fault could check that bit and go through the wp_page logic
rather than returning to userspace.

--Andy