Re: [PATCH] mm/rmap: remove unnecessary page_table_lock

From: Yajun Deng
Date: Tue Apr 23 2024 - 23:06:10 EST


April 24, 2024 at 1:11 AM, "Liam R. Howlett" <Liam.Howlett@xxxxxxxxxx> wrote:



>
> * Yajun Deng <yajun.deng@xxxxxxxxx> [240423 04:35]:
>
> >
> > April 23, 2024 at 4:18 PM, "David Hildenbrand" <david@xxxxxxxxxx> wrote:
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On 23.04.24 09:53, Yajun Deng wrote:
> >
> >
> >
> > >
> >
> > > April 22, 2024 at 7:24 PM, "David Hildenbrand" <david@xxxxxxxxxx> wrote:
> >
> > >
> >
> > > > >>
> >
> > >
> >
> > > >
> >
> > > > On 22.04.24 12:52, Yajun Deng wrote:
> >
> > > >
> >
> > >
> >
> > > page_table_lock is a lock that for page table, we won't change page
> >
> > >
> >
> > > table in __anon_vma_prepare(). As we can see, it works well in
> >
> > >
> >
> > > anon_vma_clone(). They do the same operation.
> >
> > >
> >
> > > >
> >
> > > > We are reusing mm->page_table_lock to serialize, not the *actual* low-level page table locks that really protect PTEs.
> >
> > > >
> >
> > > > With that locking gone, there would be nothing protection vma->anon_vma.
> >
> > > >
> >
> > > > Note that anon_vma_clone() is likely called with the mmap_lock held in write mode, which is not the case for __anon_vma_prepare() ...
> >
> > > >
> >
> > >
> >
> > > Yes, anon_vma_clone() is called with the mmap_lock held. I added mmap_assert_write_locked(dst->vm_mm) to prove it.
> >
> > >
> >
> > > I added mmap_assert_write_locked(vma->vm_mm) in __anon_vma_prepare() at the same time, it shows __anon_vma_prepare()
> >
> > >
> >
> > > is also called with the mmap_lock held too.
> >
> > >
> >
> >
> >
> > Make sure you actually have lockdep built in and enabled.
> >
> >
> >
> >
> >
> > This is my config.
> >
> > CONFIG_LOCKDEP=n
> >
> > CONFIG_DEBUG_VM=y
> >
> >
> >
> > I did another test.
> >
> > I put mmap_assert_write_locked(mm) before 'set_bit(MMF_OOM_SKIP, &mm->flags)' in mmap.c, it's outside the lock.
> >
> > It will crash when on boot. I think mmap_assert_write_locked() works.
> >
>
> If you are changing locks, then please test with lockdep on.
>

It's my fault. It shows a warning with lockdep on.

> >
> > __anon_vma_prepare() is for example called from do_anonymous_page() where we might only hold the mmap_lock in read mode (or not at all IIRC with VMA in read mode).
> >
>
> Consider two concurrent readers getting to this function with the same
>
> vma. There is no mergeable anon vma, so both create a new anon_vma.
>
> You take the anon_vma lock to parallelize the linking to the vma - but
>
> they are different locks because they are both new anon_vma structs
>
> allocated by concurrent readers.
>
> You now need a lock that you know cannot allow this to happen. Looking
>
> at the top of mm/rmap.c and see which one works. The next in line is
>
> the page table lock, so that one is used here.
>
> What if we reverse the locks? We can deadlock.
>
> What if we don't take the anon_vma lock? We can have two writers to the
>
> anon_vma.
>
> Thanks,
>
> Liam
>