Re: [PATCH v3 16/16] mm/mmap: Move may_expand_vm() check in mmap_region()

From: Lorenzo Stoakes
Date: Tue Jul 09 2024 - 10:46:21 EST


On Mon, Jul 08, 2024 at 04:43:15PM GMT, Liam R. Howlett wrote:
> * Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> [240708 08:53]:
> > On Thu, Jul 04, 2024 at 02:27:18PM GMT, Liam R. Howlett wrote:
> > > From: "Liam R. Howlett" <Liam.Howlett@xxxxxxxxxx>
> > >
> > > The MAP_FIXED page count is available after the vms_gather_munmap_vmas()
> > > call, so use it instead of looping over the vmas twice.
> >
> > Predictably indeed you removed the thing I commented on in the last patch
> > ;) but at least this time I predicted it! ;)
> >
> > >
> > > Signed-off-by: Liam R. Howlett <Liam.Howlett@xxxxxxxxxx>
> > > ---
> > > mm/mmap.c | 36 ++++--------------------------------
> > > 1 file changed, 4 insertions(+), 32 deletions(-)
> > >
> > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > index b2de26683903..62edaabf3987 100644
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -400,27 +400,6 @@ anon_vma_interval_tree_post_update_vma(struct vm_area_struct *vma)
> > > anon_vma_interval_tree_insert(avc, &avc->anon_vma->rb_root);
> > > }
> > >
> > > -static unsigned long count_vma_pages_range(struct mm_struct *mm,
> > > - unsigned long addr, unsigned long end,
> > > - unsigned long *nr_accounted)
> > > -{
> > > - VMA_ITERATOR(vmi, mm, addr);
> > > - struct vm_area_struct *vma;
> > > - unsigned long nr_pages = 0;
> > > -
> > > - *nr_accounted = 0;
> > > - for_each_vma_range(vmi, vma, end) {
> > > - unsigned long vm_start = max(addr, vma->vm_start);
> > > - unsigned long vm_end = min(end, vma->vm_end);
> > > -
> > > - nr_pages += PHYS_PFN(vm_end - vm_start);
> > > - if (vma->vm_flags & VM_ACCOUNT)
> > > - *nr_accounted += PHYS_PFN(vm_end - vm_start);
> > > - }
> > > -
> > > - return nr_pages;
> > > -}
> > > -
> > > static void __vma_link_file(struct vm_area_struct *vma,
> > > struct address_space *mapping)
> > > {
> > > @@ -2946,17 +2925,6 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> > > pgoff_t vm_pgoff;
> > > int error = -ENOMEM;
> > > VMA_ITERATOR(vmi, mm, addr);
> > > - unsigned long nr_pages, nr_accounted;
> > > -
> > > - nr_pages = count_vma_pages_range(mm, addr, end, &nr_accounted);
> > > -
> > > - /* Check against address space limit. */
> > > - /*
> > > - * MAP_FIXED may remove pages of mappings that intersects with requested
> > > - * mapping. Account for the pages it would unmap.
> > > - */
> > > - if (!may_expand_vm(mm, vm_flags, pglen - nr_pages))
> > > - return -ENOMEM;
> > >
> > > if (unlikely(!can_modify_mm(mm, addr, end)))
> > > return -EPERM;
> > > @@ -2987,6 +2955,10 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> > > vma_iter_next_range(&vmi);
> > > }
> > >
> > > + /* Check against address space limit. */
> > > + if (!may_expand_vm(mm, vm_flags, pglen - vms.nr_pages))
> > > + goto abort_munmap;
> > > +
> >
> > I know you can literally only do this after the vms_gather_munmap_vmas(),
> > but this does change where we check this, so for instance we do
> > arch_unmap() without having checked may_expand_vm().
> >
> > However I assume this is fine?
>
> Thanks for pointing this out.
>
> The functionality here has changed
> --- from ---
> may_expand_vm() check
> can_modify_mm() check
> arch_unmap()
> vms_gather_munmap_vmas()
> ...
>
> --- to ---
> can_modify_mm() check
> arch_unmap()
> vms_gather_munmap_vmas()
> may_expand_vm() check
> ...
>
> vms_gather_munmap_vmas() does nothing but figures out what to do later,
> but could use memory and can fail.
>
> The user implications are:
>
> 1. The return type on the error may change to -EPERM from -ENOMEM, if
> you are not allowed to expand and are trying to overwrite mseal()'ed
> VMAs. That seems so very rare that I'm not sure it's worth mentioning.
>
>
> 2. arch_unmap() called prior to may_expand_vm().
> powerpc uses this to set mm->context.vdso = NULL if mm->context.vdso is
> within the unmap range. User implication of this means that an
> application my set the vdso to NULL prior to hitting the -ENOMEM case in
> may_expand_vm() due to the address space limit.
>
> Assuming the removal of the vdso does not cause the application to seg
> fault, then the user visible change is that any vdso call after a failed
> mmap(MAP_FIXED) call would result in a seg fault. The only reason it
> would fail is if the mapping process was attempting to map a large
> enough area over the vdso (which is accounted and in the vma tree,
> afaict) and ran out of memory. Note that this situation could arise
> already since we could run out of memory (not accounting) after the
> arch_unmap() call within the kernel.
>
> The code today can suffer the same fate, but not by the accounting
> failure. It can happen due to failure to allocate a new vma,
> do_vmi_munmap() failure after the arch_unmap() call, or any of the other
> failure scenarios later in the mmap_region() function.
>
> At the very least, this requires an expanded change log.

Indeed, also (as mentioned on IRC) I feel like we need to look at whether
we _truly_ need this arch_unmap() call for a single, rather antiquated,
architecture.

I mean why are they unmapping the VDSO, why is that valid, why does it need
that field to be set to NULL, is it possible to signify that in some other
way etc.?

Regardless, I think the change you make here is fine and shouldn't be a
blocker for your changes at all.

But agreed, I do think expanding the change log a bit would be helpful.

>
> >
> > > /*
> > > * Private writable mapping: check memory availability
> > > */
> > > --
> > > 2.43.0
> > >
> >
> > Looks good to me generally,
> >
> > Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>