Re: [PATCH v2 08/10] mm/hugetlb: Make walk_hugetlb_range() safe to pmd unshare

From: David Hildenbrand
Date: Fri Dec 09 2022 - 05:26:14 EST


On 08.12.22 21:47, Peter Xu wrote:
On Thu, Dec 08, 2022 at 02:14:42PM +0100, David Hildenbrand wrote:
On 07.12.22 21:30, Peter Xu wrote:
Since walk_hugetlb_range() walks the pgtable, it needs the vma lock
to make sure the pgtable page will not be freed concurrently.

Reviewed-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
Signed-off-by: Peter Xu <peterx@xxxxxxxxxx>
---
arch/s390/mm/gmap.c | 2 ++
fs/proc/task_mmu.c | 2 ++
include/linux/pagewalk.h | 11 ++++++++++-
mm/hmm.c | 15 ++++++++++++++-
mm/pagewalk.c | 2 ++
5 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 8947451ae021..292a54c490d4 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2643,7 +2643,9 @@ static int __s390_enable_skey_hugetlb(pte_t *pte, unsigned long addr,
end = start + HPAGE_SIZE - 1;
__storage_key_init_range(start, end);
set_bit(PG_arch_1, &page->flags);
+ hugetlb_vma_unlock_read(walk->vma);
cond_resched();
+ hugetlb_vma_lock_read(walk->vma);
return 0;
}
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index e35a0398db63..cf3887fb2905 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1613,7 +1613,9 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask,
frame++;
}
+ hugetlb_vma_unlock_read(walk->vma);
cond_resched();
+ hugetlb_vma_lock_read(walk->vma);

We already hold the mmap_lock and reschedule. Even without the
cond_resched() we might happily reschedule on a preemptive kernel. So I'm
not sure if this optimization is strictly required or even helpful in
practice here.

It's just low hanging fruit if we need that complexity anyway.

That's also why I didn't do that for v1 (where I missed hmm special case,
though..), but I think since we'll need that anyway, we'd better release
the vma lock if we can easily do so.

mmap_lock is just more special because it needs more work in the caller to
release (e.g. vma invalidations). Otherwise I'm happy dropping that too.


In the worst case, concurrent unsharing would have to wait.
For example, s390_enable_skey() is called at most once for a VM, for most
VMs it gets never even called.

Or am I missing something important?

Nothing important. I just don't see why we need to strictly follow the
same release rule of mmap_lock here when talking about vma lock.

In short - if we can drop a lock earlier before sleep, why not?

I tend to just keep it as-is, but let me know if you have further thoughts
or concerns.

To me this looks like a possibly unnecessary optimization, requiring additional code. IMHO, possibly unnecessary unlock+relock makes thatthat code harder to get.

For such cases, it would be good to have any evidence that it really helps.

--
Thanks,

David / dhildenb