Re: [BUG] page table UAF, Re: [PATCH v8 14/21] mm/mmap: Avoid zeroing vma tree in mmap_region()

From: Liam R. Howlett
Date: Fri Oct 11 2024 - 10:26:53 EST


* Jann Horn <jannh@xxxxxxxxxx> [241008 13:16]:
...
> > > > >
> > > > > task 1 (mmap, MAP_FIXED) task 2 (ftruncate)
> > > > > ======================== ==================
> > > > > mmap_region
> > > > > vma_merge_new_range
> > > > > vma_expand
> > > > > commit_merge
> > > > > vma_prepare
> > > > > [take rmap locks]
> > > > > vma_set_range
> > > > > [expand adjacent mapping]
> > > > > vma_complete
> > > > > [drop rmap locks]
> > > > > vms_complete_munmap_vmas
> > > > > vms_clear_ptes
> > > > > unmap_vmas
> > > > > [removes ptes]
> > > > > free_pgtables
> > > > > [unlinks old vma from rmap]
> > > > > unmap_mapping_range
> > > > > unmap_mapping_pages
> > > > > i_mmap_lock_read
> > > > > unmap_mapping_range_tree
> > > > > [loop]
> > > > > unmap_mapping_range_vma
> > > > > zap_page_range_single
> > > > > unmap_single_vma
> > > > > unmap_page_range
> > > > > zap_p4d_range
> > > > > zap_pud_range
> > > > > zap_pmd_range
> > > > > [looks up pmd entry]
> > > > > free_pgd_range
> > > > > [frees pmd]
> > > > > [UAF pmd entry access]
> > > > >
> > > > > To reproduce this, apply the attached mmap-vs-truncate-racewiden.diff
> > > > > to widen the race windows, then build and run the attached reproducer
> > > > > mmap-fixed-race.c.
> > > > >
> > > > > Under a kernel with KASAN, you should ideally get a KASAN splat like this:
> > > >
...

>
> Or you could basically unmap the VMA while it is still in the VMA tree
> but is already locked and marked as detached? So first you do
> unmap_vmas() and free_pgtables() (which clears the PTEs, removes the
> rmap links, and deletes page tables), then prepare the new VMAs, and
> then replace the old VMA's entries in the VMA tree with the new
> entries? I guess in the end the result would semantically be pretty
> similar to having markers in the maple tree.
>

After trying a few other methods, I ended up doing something like you
said above. I already had to do this if call_mmap() was to be used, so
the code change isn't that large. Doing it unconditionally on MAP_FIXED
seems like the safest plan.

The other methods were unsuccessful due to the locking order that exists
in fsreclaim and other areas.

Basically, the vma tree will not see a gap, but the rmap will see a gap.

Unfortunately this expands the number of failures which cannot be undone
with my design but still less than existed before. Most errors will
generate the historic vma gap, sadly.

Thanks,
Liam