On Fri, Aug 09, 2013 at 07:02:12PM +0400, Maxim Patlasov wrote:08/06/2013 08:25 PM, Miklos Szeredi ÐÐÑÐÑ:The problem is: if we need it in ->writepages, we need it in ->writepage too.Hmm. Direct IO on an mmaped file will do get_user_pages() which willYes, I agree.
do the necessary page fault magic and ->page_mkwrite() will be called.
At least AFAICS.
The page cannot become dirty through a memory mapping without firstYes, that's correct, but I don't understand why you disregard two
switching the pte from read-only to read-write first. Page accounting
logic relies on this too. The other way the page can become dirty is
through write(2) on the fs. But we do get notified about that too.
other cases of marking page dirty (both related to direct AIO read
from a file to a memory region mmap-ed to a fuse file):
1. dio_bio_submit() -->
bio_set_pages_dirty() -->
set_page_dirty_lock()
2. dio_bio_complete() -->
bio_check_pages_dirty() -->
bio_dirty_fn() -->
bio_set_pages_dirty() -->
set_page_dirty_lock()
As soon as a page became dirty through a memory mapping (exactly as
you explained), nothing would prevent it to be written-back. And
fuse will call end_page_writeback almost immediately after copying
the real page to a temporary one. Then dio_bio_submit may re-dirty
page speculatively w/o notifying fuse. And again, since then nothing
would prevent it to be written-back once more. Hence we can end up
in more then one temporary page in fuse write-back. And similar
concern for dio_bio_complete() re-dirty.
This make me think that we do need fuse_page_is_writeback() in
fuse_writepages_fill(). But it shouldn't be harmful because it will
no-op practically always due to waiting for fuse writeback in
->page_mkwrite() and in course of handling write(2).
And that's where we can't have it because it would deadlock in reclaim.
+ if (fuse_page_is_writeback(inode, page->index)) {
+ if (wbc->sync_mode != WB_SYNC_ALL) {
+ redirty_page_for_writepage(wbc, page);
+ return 0;
+ }
+ fuse_wait_on_page_writeback(inode, page->index);
+ }
There's a way to work around this:
- if the request is still in queue, just update it with the contents of the
new page
- if the request already in userspace, create a new reqest, but only let
userspace have it once the previous request for the same page completes, so
the ordering is not messed up
But that's a lot of hairy code.
Any other ideas?
The best would be if we could get rid of the ugly temporary page requirement for
fuse writeback. The big blocker has always been direct reclaim: allocation must
not wait on fuse writebacks. AFAICS there's still a wait_on_page_writeback() in
relation to memcg. And it interacts with page migration which also has them.
Those are the really difficult ones...
The other offender, balance_dirty_pages() is much easier and needs to be tought
about fuse behavior anyway.