Re: [PATCH v3 11/16] mm/mmap: Track start and end of munmap in vma_munmap_struct

From: Suren Baghdasaryan
Date: Wed Jul 10 2024 - 13:14:54 EST


On Fri, Jul 5, 2024 at 1:27 PM Lorenzo Stoakes
<lorenzo.stoakes@xxxxxxxxxx> wrote:
>
> On Thu, Jul 04, 2024 at 02:27:13PM GMT, Liam R. Howlett wrote:
> > From: "Liam R. Howlett" <Liam.Howlett@xxxxxxxxxx>
> >
> > Set the start and end address for munmap when the prev and next are
> > gathered. This is needed to avoid incorrect addresses being used during
> > the vms_complete_munmap_vmas() function if the prev/next vma are
> > expanded.
>
> When we spoke about this separately you mentioned that specific arches may
> be more likely to encounter this issue, perhaps worth mentioning something
> about that in the commit msg? Unless I misunderstood you.
>
> >
> > Add a new helper vms_complete_pte_clear(), which is needed later and
> > will avoid growing the argument list to unmap_region() beyond the 9 it
> > already has.
>
> My word.
>
> >
> > Signed-off-by: Liam R. Howlett <Liam.Howlett@xxxxxxxxxx>
> > ---
> > mm/internal.h | 2 ++
> > mm/mmap.c | 34 +++++++++++++++++++++++++++-------
> > 2 files changed, 29 insertions(+), 7 deletions(-)
> >
> > diff --git a/mm/internal.h b/mm/internal.h
> > index 8cbbbe7d40f3..4c9f06669cc4 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -1493,6 +1493,8 @@ struct vma_munmap_struct {
> > struct list_head *uf; /* Userfaultfd list_head */
> > unsigned long start; /* Aligned start addr */
> > unsigned long end; /* Aligned end addr */
> > + unsigned long unmap_start;
> > + unsigned long unmap_end;
> > int vma_count; /* Number of vmas that will be removed */
> > unsigned long nr_pages; /* Number of pages being removed */
> > unsigned long locked_vm; /* Number of locked pages */
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index ecf55d32e804..45443a53be76 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -525,6 +525,8 @@ static inline void init_vma_munmap(struct vma_munmap_struct *vms,
> > vms->vma_count = 0;
> > vms->nr_pages = vms->locked_vm = vms->nr_accounted = 0;
> > vms->exec_vm = vms->stack_vm = vms->data_vm = 0;
> > + vms->unmap_start = FIRST_USER_ADDRESS;
> > + vms->unmap_end = USER_PGTABLES_CEILING;
> > }
> >
> > /*
> > @@ -2610,6 +2612,26 @@ static inline void abort_munmap_vmas(struct ma_state *mas_detach)
> > __mt_destroy(mas_detach->tree);
> > }
> >
> > +
> > +static void vms_complete_pte_clear(struct vma_munmap_struct *vms,
> > + struct ma_state *mas_detach, bool mm_wr_locked)
> > +{
> > + struct mmu_gather tlb;
> > +
> > + /*
> > + * We can free page tables without write-locking mmap_lock because VMAs
> > + * were isolated before we downgraded mmap_lock.
> > + */
> > + mas_set(mas_detach, 1);
> > + lru_add_drain();
> > + tlb_gather_mmu(&tlb, vms->mm);
> > + update_hiwater_rss(vms->mm);
> > + unmap_vmas(&tlb, mas_detach, vms->vma, vms->start, vms->end, vms->vma_count, mm_wr_locked);
> > + mas_set(mas_detach, 1);
>
> I know it's necessary as unmap_vmas() will adjust mas_detach, but it kind
> of aesthetically sucks to set it to 1, do some stuff, then set it to 1
> again. But this is not a big deal :>)
>
> > + free_pgtables(&tlb, mas_detach, vms->vma, vms->unmap_start, vms->unmap_end, mm_wr_locked);
>
> Yeah this bit definitely needs a comment I think, this is very confusing
> indeed. Under what circumstances will these differ from [vms->start,
> vms->end), etc.?
>
> I'm guessing it's to do with !vms->prev and !vms->next needing to be set to
> [FIRST_USER_ADDRESS, USER_PGTABLES_CEILING)?
>
> > + tlb_finish_mmu(&tlb);
> > +}
> > +
> > /*
> > * vms_complete_munmap_vmas() - Finish the munmap() operation
> > * @vms: The vma munmap struct
> > @@ -2631,13 +2653,7 @@ static void vms_complete_munmap_vmas(struct vma_munmap_struct *vms,
> > if (vms->unlock)
> > mmap_write_downgrade(mm);
> >
> > - /*
> > - * We can free page tables without write-locking mmap_lock because VMAs
> > - * were isolated before we downgraded mmap_lock.
> > - */
> > - mas_set(mas_detach, 1);
> > - unmap_region(mm, mas_detach, vms->vma, vms->prev, vms->next,
> > - vms->start, vms->end, vms->vma_count, !vms->unlock);
> > + vms_complete_pte_clear(vms, mas_detach, !vms->unlock);
> > /* Update high watermark before we lower total_vm */
> > update_hiwater_vm(mm);
> > /* Stat accounting */
> > @@ -2699,6 +2715,8 @@ static int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
> > goto start_split_failed;
> > }
> > vms->prev = vma_prev(vms->vmi);
> > + if (vms->prev)
> > + vms->unmap_start = vms->prev->vm_end;
> >
> > /*
> > * Detach a range of VMAs from the mm. Using next as a temp variable as
> > @@ -2757,6 +2775,8 @@ static int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
> > }
> >
> > vms->next = vma_next(vms->vmi);
> > + if (vms->next)
> > + vms->unmap_end = vms->next->vm_start;
> >
> > #if defined(CONFIG_DEBUG_VM_MAPLE_TREE)
> > /* Make sure no VMAs are about to be lost. */
> > --
> > 2.43.0
> >
>
> Other than wanting some extra comments, this looks fine and I know how
> hard-won the unmap range bit of this change was so:
>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>

Ok, another case when code duplication will be removed in the next patch. LGTM.

Reviewed-by: Suren Baghdasaryan <surenb@xxxxxxxxxx>