Re: [PATCH v6 10/16] mm: replace vm_lock and detached flag with a reference count

From: Peter Zijlstra
Date: Thu Dec 19 2024 - 04:14:04 EST


On Wed, Dec 18, 2024 at 01:53:17PM -0800, Suren Baghdasaryan wrote:

> Ah, ok I see now. I completely misunderstood what for_each_vma_range()
> was doing.
>
> Then I think vma_start_write() should remain inside
> vms_gather_munmap_vmas() and all vmas in mas_detach should be

No, it must not. You really are not modifying anything yet (except the
split, which we've already noted mark write themselves).

> write-locked, even the ones we are not modifying. Otherwise what would
> prevent the race I mentioned before?
>
> __mmap_region
> __mmap_prepare
> vms_gather_munmap_vmas // adds vmas to be unmapped into mas_detach,
> // some locked
> by __split_vma(), some not locked
>
> lock_vma_under_rcu()
> vma = mas_walk // finds
> unlocked vma also in mas_detach
> vma_start_read(vma) //
> succeeds since vma is not locked
> // vma->detached, vm_start,
> vm_end checks pass
> // vma is successfully read-locked
>
> vms_clean_up_area(mas_detach)
> vms_clear_ptes
> // steps on a cleared PTE

So here we have the added complexity that the vma is not unhooked at
all. Is there anything that would prevent a concurrent gup_fast() from
doing the same -- touch a cleared PTE?

AFAICT two threads, one doing overlapping mmap() and the other doing
gup_fast() can result in exactly this scenario.

If we don't care about the GUP case, when I'm thinking we should not
care about the lockless RCU case either.

> __mmap_new_vma
> vma_set_range // installs new vma in the range
> __mmap_complete
> vms_complete_munmap_vmas // vmas are write-locked and detached
> but it's too late

But at this point that old vma really is unhooked, and the
vma_write_start() here will ensure readers are gone and it will clear
PTEs *again*.