Re: [RFC v3 PATCH 4/5] mm: mmap: zap pages with read mmap_sem for large mapping

From: Yang Shi
Date: Mon Jul 02 2018 - 20:01:58 EST

On 6/29/18 9:26 PM, Yang Shi wrote:

On 6/29/18 8:15 PM, Andrew Morton wrote:
On Fri, 29 Jun 2018 19:28:15 -0700 Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx> wrote:

we're adding a bunch of code to 32-bit kernels which will never be

I'm thinking it would be better to be much more explicit with "#ifdef
CONFIG_64BIT" in this code, rather than relying upon the above magic.

But I tend to think that the fact that we haven't solved anything on
locked vmas or on uprobed mappings is a shostopper for the whole
approach :(
I agree it is not that perfect. But, it still could improve the most use
Well, those unaddressed usecases will need to be fixed at some point.

Yes, definitely.

What's our plan for that?

As I mentioned in the earlier email, locked and hugetlb cases might be able to be solved by separating vm_flags update and actual unmap. I will look into it further later.

By looking into this furtherï I think both mlocked and hugetlb vmas can be handled.

For mlocked vmas, it is easy since we acquires write mmap_sem before unmapping, so VM_LOCK flags can be cleared here then unmap, just like what the regular path does.

For hugetlb vmas, the VM_MAYSHARE flag is just checked by huge_pmd_share() in hugetlb_fault()->huge_pte_alloc(), another call site is dup_mm()->copy_page_range()->copy_hugetlb_page_range(), we don't care this call chain in this case.

So we may expand VM_DEAD to hugetlb_fault(). Michal suggested to check VM_DEAD in check_stable_address_space(), so it would be called in hugetlb_fault() too (not in current code), then the page fault handler would bail out before huge_pte_alloc() is called.

With this trick, we don't have to care about when the vm_flags is updated, we can unmap hugetlb vmas in read mmap_sem critical section, then update the vm_flags with write mmap_sem held or before the unmap.


From my point of view, uprobe mapping sounds not that vital.

Would one of your earlier designs have addressed all usecases? I
expect the dumb unmap-a-little-bit-at-a-time approach would have?

Yes. The v1 design does unmap with holding write map_sem. So, the vm_flags update is not a problem.


For the locked vmas and hugetlb vmas, unmapping operations need modify
vm_flags. But, I'm wondering we might be able to separate unmap and
vm_flags update. Because we know they will be unmapped right away, the
vm_flags might be able to be updated in write mmap_sem critical section
before the actual unmap is called or after it. This is just off the top
of my head.

For uprobed mappings, I'm not sure how vital it is to this case.