Re: possible deadlock in get_user_pages_unlocked

From: Al Viro
Date: Fri Feb 02 2018 - 00:46:55 EST


On Thu, Feb 01, 2018 at 09:35:02PM -0800, Eric Biggers wrote:

> Try starting up multiple instances of the program; that sometimes helps with
> these races that are hard to hit (since you may e.g. have a different number of
> CPUs than syzbot used). If I start up 4 instances I see the lockdep splat after
> around 2-5 seconds.

5 instances in parallel, 10 minutes into the run...

> This is on latest Linus tree (4bf772b1467). Also note the
> reproducer uses KVM, so if you're running it in a VM it will only work if you've
> enabled nested virtualization on the host (kvm_intel.nested=1).

cat /sys/module/kvm_amd/parameters/nested
1

on host

> Also it appears to go away if I revert ce53053ce378c21 ("kvm: switch
> get_user_page_nowait() to get_user_pages_unlocked()").

That simply prevents this reproducer hitting get_user_pages_unlocked()
instead of grab mmap_sem/get_user_pages/drop mmap_sem. I.e. does not
allow __get_user_pages_locked() to drop/regain ->mmap_sem.

The bug may be in the way we call get_user_pages_unlocked() in that
commit, but it might easily be a bug in __get_user_pages_locked()
exposed by that reproducer somehow.