Re: [syzbot] [net?] possible deadlock in vm_insert_page

From: Boqun Feng
Date: Mon Dec 30 2024 - 13:32:02 EST


On Mon, Dec 30, 2024 at 10:22:27AM -0800, Suren Baghdasaryan wrote:
[...]
> > > >
> > > > Also a quick look seems to suggest that the lock dependency on CPU 1:
> > > >
> > > > lock(&vma->vm_lock->lock);
> > > > lock(sb_pagefaults#4);
> > > >
> > > > can happen in a page fault with a reader of &vma->vm_lock->lock.
> > >
> > > The report clearly indicates a call to vma_start_write(), which means
> > > vm_lock is being write-locked, not read-locked. That's why I commented
> > > that the report does not consider that mmap_write_lock is already
> > > taken when vma_start_write() is called.
> > >
> > > >
> > > > do_page_fault():
> > > > lock_vma_under_rcu():
> > > > vma_start_read():
> > > > down_read_trylock(); // read lock &vma->vm_lock_lock here.
> > > > ...
> > > > handle_mm_fault():
> > > > sb_start_pagefault(); // lock(sb_pagefaults#4);
> > > >
> > > > if so, an existing reader can block the other writer, so I don't think
> > > > the mmap_lock write protection can help here.
> > >
> > > In your example vma->vm_lock would be read-locked before
> > > po->pg_vec_lock but in the report po->pg_vec_lock is locked before
> > > vma->vm_lock->lock. I don't think what is reported here is the
> > > do_page_fault() path.
> > >
> >
> > You're missing the point, in the report, the current stack is indeed in
> > a write path (i.e. &mm->mmap_lock first and then &vma->vm_lock->lock),
> > however that's only part of the picture. The deadlock
> > possibility is due to that there could be a concurrent do_page_fault()
> > which will hold &vma->vm_lock->lock first and wait for another lock that
> > eventually has a dependency on a &mm->mmap_lock.
>
> I need to see a more concrete example.
> Note that do_page_fault() does not even read-lock the mmap_lock when
> it uses vma->vm_lock, that's the whole point of per-vma locks that we
> avoid using mmap_lock. So, even if it later waits on some other lock
> that has mm->mmap_lock dependency, that should not block it.
> Again, you might be right and there might be a lockdep issue but I
> need a more specific example to see if it's real.
>

Understood. I clearly don't have the whole set of knowledge/skills to
make the call ;-) I just tried my best to figure out what lockdep
thought in this case (see the other email), it's quite fun to hunt down
a "deadlock" possiblity involing 11 locks. Right now, I'm leaning
torwards that this is 80% a false positive because one of the dependency
was built during initcall, so it may not happen in real code, but I need
to defer that to drm folks.

Regards,
Boqun

> >
> > Regards,
> > Boqun
> >
[...]