Re: [PATCH] mm/khugepaged: Fix ->anon_vma race

From: Jann Horn
Date: Tue Jan 17 2023 - 15:26:21 EST


On Wed, Jan 11, 2023 at 2:33 PM Jann Horn <jannh@xxxxxxxxxx> wrote:
> If an ->anon_vma is attached to the VMA, collapse_and_free_pmd() requires
> it to be locked. retract_page_tables() bails out if an ->anon_vma is
> attached, but does this check before holding the mmap lock (as the comment
> above the check explains).

@akpm please replace the commit message with the following, and maybe
also add a "Link:" entry pointing to
https://lore.kernel.org/linux-mm/CAG48ez3434wZBKFFbdx4M9j6eUwSUVPd4dxhzW_k_POneSDF+A@xxxxxxxxxxxxxx/
for the reproducer.


If an ->anon_vma is attached to the VMA, collapse_and_free_pmd() requires
it to be locked.
Page table traversal is allowed under any one of the mmap lock, the
anon_vma lock (if the VMA is associated with an anon_vma), and the
mapping lock (if the VMA is associated with a mapping); and so to be
able to remove page tables, we must hold all three of them.
retract_page_tables() bails out if an ->anon_vma is attached, but
does this check before holding the mmap lock (as the comment above
the check explains).

If we racily merge an existing ->anon_vma (shared with a child process)
from a neighboring VMA, subsequent rmap traversals on pages belonging to
the child will be able to see the page tables that we are concurrently
removing while assuming that nothing else can access them.

Repeat the ->anon_vma check once we hold the mmap lock to ensure that there
really is no concurrent page table access.

Hitting this bug causes a lockdep warning in collapse_and_free_pmd(),
in the line
"lockdep_assert_held_write(&vma->anon_vma->root->rwsem)".
It can also lead to use-after-free access.