Re: possible deadlock in get_user_pages_unlocked
From: Al Viro
Date: Fri Feb 02 2018 - 01:20:58 EST
On Fri, Feb 02, 2018 at 05:46:26AM +0000, Al Viro wrote:
> On Thu, Feb 01, 2018 at 09:35:02PM -0800, Eric Biggers wrote:
>
> > Try starting up multiple instances of the program; that sometimes helps with
> > these races that are hard to hit (since you may e.g. have a different number of
> > CPUs than syzbot used). If I start up 4 instances I see the lockdep splat after
> > around 2-5 seconds.
>
> 5 instances in parallel, 10 minutes into the run...
>
> > This is on latest Linus tree (4bf772b1467). Also note the
> > reproducer uses KVM, so if you're running it in a VM it will only work if you've
> > enabled nested virtualization on the host (kvm_intel.nested=1).
>
> cat /sys/module/kvm_amd/parameters/nested
> 1
>
> on host
>
> > Also it appears to go away if I revert ce53053ce378c21 ("kvm: switch
> > get_user_page_nowait() to get_user_pages_unlocked()").
>
> That simply prevents this reproducer hitting get_user_pages_unlocked()
> instead of grab mmap_sem/get_user_pages/drop mmap_sem. I.e. does not
> allow __get_user_pages_locked() to drop/regain ->mmap_sem.
>
> The bug may be in the way we call get_user_pages_unlocked() in that
> commit, but it might easily be a bug in __get_user_pages_locked()
> exposed by that reproducer somehow.
I think I understand what's going on. FOLL_NOWAIT handling is a serious
mess ;-/ I'll probably have something to test tomorrow - I still can't
reproduce it here, unfortunately.