Re: [RFC PATCH 1/8] mm: mmap: unmap large mapping by section

From: Yang Shi
Date: Thu Mar 22 2018 - 12:47:08 EST




On 3/22/18 9:18 AM, Laurent Dufour wrote:

On 22/03/2018 17:05, Matthew Wilcox wrote:
On Thu, Mar 22, 2018 at 04:54:52PM +0100, Laurent Dufour wrote:
On 22/03/2018 16:40, Matthew Wilcox wrote:
On Thu, Mar 22, 2018 at 04:32:00PM +0100, Laurent Dufour wrote:
Regarding the page fault, why not relying on the PTE locking ?

When munmap() will unset the PTE it will have to held the PTE lock, so this
will serialize the access.
If the page fault occurs before the mmap(MAP_FIXED), the page mapped will be
removed when mmap(MAP_FIXED) would do the cleanup. Fair enough.
The page fault handler will walk the VMA tree to find the correct
VMA and then find that the VMA is marked as deleted. If it assumes
that the VMA has been deleted because of munmap(), then it can raise
SIGSEGV immediately. But if the VMA is marked as deleted because of
mmap(MAP_FIXED), it must wait until the new VMA is in place.
I'm wondering if such a complexity is required.
If the user space process try to access the page being overwritten through
mmap(MAP_FIXED) by another thread, there is no guarantee that it will
manipulate the *old* page or *new* one.
Right; but it must return one or the other, it can't segfault.
Good point, I missed that...

I'd think this is up to the user process to handle that concurrency.
What needs to be guaranteed is that once mmap(MAP_FIXED) returns the old page
are no more there, which is done through the mmap_sem and PTE locking.
Yes, and allowing the fault handler to return the *old* page risks the
old page being reinserted into the page tables after the unmapping task
has done its work.
The PTE locking should prevent that.

It's *really* rare to page-fault on a VMA which is in the middle of
being replaced. Why are you trying to optimise it?
I was not trying to optimize it, but to not wait in the page fault handler.
This could become tricky in the case the VMA is removed once mmap(MAP_FIXED) is
done and before the waiting page fault got woken up. This means that the
removed VMA structure will have to remain until all the waiters are woken up
which implies ref_count or similar.

We may not need ref_count. After removing "locked-for-deletion" vmas when mmap(MAP_FIXED) is done, just wake up page fault to re-lookup vma, then it will find the new vma installed by mmap(MAP_FIXED), right?

I'm not sure if completion can do this or not since I'm not quite familiar with it :-(

Yang


I think I was wrong to describe VMAs as being *deleted*. I think we
instead need the concept of a *locked* VMA that page faults will block on.
Conceptually, it's a per-VMA rwsem, but I'd use a completion instead of
an rwsem since the only reason to write-lock the VMA is because it is
being deleted.
Such a lock would only makes sense in the case of mmap(MAP_FIXED) since when
the VMA is removed there is no need to wait. Isn't it ?
I can't think of another reason. I suppose we could mark the VMA as
locked-for-deletion or locked-for-replacement and have the SIGSEGV happen
early. But I'm not sure that optimising for SIGSEGVs is a worthwhile
use of our time. Just always have the pagefault sleep for a deleted VMA.