Re: [PATCH v3 15/15] mm/mmap: Change vma iteration order in do_vmi_align_munmap()
From: Liam R. Howlett
Date: Mon Aug 14 2023 - 15:19:55 EST
* Jann Horn <jannh@xxxxxxxxxx> [230814 11:44]:
> @akpm
>
> On Mon, Jul 24, 2023 at 8:31 PM Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> wrote:
> > Since prev will be set later in the function, it is better to reverse
> > the splitting direction of the start VMA (modify the new_below argument
> > to __split_vma).
>
> It might be a good idea to reorder "mm: always lock new vma before
> inserting into vma tree" before this patch.
>
> If you apply this patch without "mm: always lock new vma before
> inserting into vma tree", I think move_vma(), when called with a start
> address in the middle of a VMA, will behave like this:
>
> - vma_start_write() [lock the VMA to be moved]
> - move_page_tables() [moves page table entries]
> - do_vmi_munmap()
> - do_vmi_align_munmap()
> - __split_vma()
> - creates a new VMA **covering the moved range** that is **not locked**
> - stores the new VMA in the VMA tree **without locking it** [1]
> - new VMA is locked and removed again [2]
> [...]
>
> So after the page tables in the region have already been moved, I
> believe there will be a brief window (between [1] and [2]) where page
> faults in the region can happen again, which could probably cause new
> page tables and PTEs to be created in the region again in that window.
> (This can't happen in Linus' current tree because the new VMA created
> by __split_vma() only covers the range that is not being moved.)
Ah, so my reversing of which VMA to keep to the first split call opens a
window where the VMA being removed is not locked. Good catch.
>
> Though I guess that's not going to lead to anything bad, since
> do_vmi_munmap() anyway cleans up PTEs and page tables in the region?
> So maybe it's not that important.
do_vmi_munmap() will clean up PTEs from the end of the previous VMA to
the start of the next (even over areas without VMA coverages - because
there are platforms what need this, apparently).
I don't have any objections in the ordering or see an issue resulting
from having it this way... Except for maybe lockdep, so maybe we should
change the ordering of the patch sets just to be safe?
In fact, should we add another check somewhere to ensure we do generate
the warning? Perhaps to remove_mt() to avoid the exit path hitting it?
Thanks,
Liam