Re: [PATCH] mm/rmap: remove unnecessary page_table_lock
From: Liam R. Howlett
Date: Tue Apr 23 2024 - 13:11:54 EST
* Yajun Deng <yajun.deng@xxxxxxxxx> [240423 04:35]:
> April 23, 2024 at 4:18 PM, "David Hildenbrand" <david@xxxxxxxxxx> wrote:
>
>
>
> >
> > On 23.04.24 09:53, Yajun Deng wrote:
> >
> > >
> > > April 22, 2024 at 7:24 PM, "David Hildenbrand" <david@xxxxxxxxxx> wrote:
> > >
> > > > >>
> > >
> > > >
> > > > On 22.04.24 12:52, Yajun Deng wrote:
> > > >
> > >
> > > page_table_lock is a lock that for page table, we won't change page
> > >
> > > table in __anon_vma_prepare(). As we can see, it works well in
> > >
> > > anon_vma_clone(). They do the same operation.
> > >
> > > >
> > > > We are reusing mm->page_table_lock to serialize, not the *actual* low-level page table locks that really protect PTEs.
> > > >
> > > > With that locking gone, there would be nothing protection vma->anon_vma.
> > > >
> > > > Note that anon_vma_clone() is likely called with the mmap_lock held in write mode, which is not the case for __anon_vma_prepare() ...
> > > >
> > >
> > > Yes, anon_vma_clone() is called with the mmap_lock held. I added mmap_assert_write_locked(dst->vm_mm) to prove it.
> > >
> > > I added mmap_assert_write_locked(vma->vm_mm) in __anon_vma_prepare() at the same time, it shows __anon_vma_prepare()
> > >
> > > is also called with the mmap_lock held too.
> > >
> >
> > Make sure you actually have lockdep built in and enabled.
> >
>
> This is my config.
> CONFIG_LOCKDEP=n
> CONFIG_DEBUG_VM=y
>
> I did another test.
> I put mmap_assert_write_locked(mm) before 'set_bit(MMF_OOM_SKIP, &mm->flags)' in mmap.c, it's outside the lock.
> It will crash when on boot. I think mmap_assert_write_locked() works.
If you are changing locks, then please test with lockdep on.
>
>
> > __anon_vma_prepare() is for example called from do_anonymous_page() where we might only hold the mmap_lock in read mode (or not at all IIRC with VMA in read mode).
Consider two concurrent readers getting to this function with the same
vma. There is no mergeable anon vma, so both create a new anon_vma.
You take the anon_vma lock to parallelize the linking to the vma - but
they are different locks because they are both new anon_vma structs
allocated by concurrent readers.
You now need a lock that you know cannot allow this to happen. Looking
at the top of mm/rmap.c and see which one works. The next in line is
the page table lock, so that one is used here.
What if we reverse the locks? We can deadlock.
What if we don't take the anon_vma lock? We can have two writers to the
anon_vma.
Thanks,
Liam