Re: [PATCH] Documentation: kvm: fix SRCU locking order docs

From: Boqun Feng
Date: Fri Jan 13 2023 - 19:02:51 EST


On Fri, Jan 13, 2023 at 10:33:33AM +0000, David Woodhouse wrote:
> On Fri, 2023-01-13 at 10:20 +0100, Paolo Bonzini wrote:
> > On 1/13/23 08:18, Boqun Feng wrote:
> > > On Thu, Jan 12, 2023 at 07:20:48AM -0800, Paul E. McKenney wrote:
> > > > On Thu, Jan 12, 2023 at 08:24:16AM +0000, David Woodhouse wrote:
> > > > > On Wed, 2023-01-11 at 13:30 -0500, Paolo Bonzini wrote:
> > > > > >
> > > > > > +- ``synchronize_srcu(&kvm->srcu)`` is called inside critical sections
> > > > > > +  for kvm->lock, vcpu->mutex and kvm->slots_lock.  These locks _cannot_
> > > > > > +  be taken inside a kvm->srcu read-side critical section; that is, the
> > > > > > +  following is broken::
> > > > > > +
> > > > > > +      srcu_read_lock(&kvm->srcu);
> > > > > > +      mutex_lock(&kvm->slots_lock);
> > > > > > +
> > > > >
> > > > > "Don't tell me. Tell lockdep!"
> > > > >
> > > > > Did we conclude in
> > > > > https://lore.kernel.org/kvm/122f38e724aae9ae8ab474233da1ba19760c20d2.camel@xxxxxxxxxxxxx/
> > > > > that lockdep *could* be clever enough to catch a violation of this rule
> > > > > by itself?
> > > > >
> > > > > The general case of the rule would be that 'if mutex A is taken in a
> > > > > read-section for SCRU B, then any synchronize_srcu(B) while mutex A is
> > > > > held shall be verboten'. And vice versa.
> > > > >
> > > > > If we can make lockdep catch it automatically, yay!
> > > >
> > > > Unfortunately, lockdep needs to see a writer to complain, and that patch
> > > > just adds a reader.  And adding that writer would make lockdep complain
> > > > about things that are perfectly fine.  It should be possible to make
> > > > lockdep catch this sort of thing, but from what I can see, doing so
> > > > requires modifications to lockdep itself.
> > > >
> > >
> > > Please see if the follow patchset works:
> > >
> > >         https://lore.kernel.org/lkml/20230113065955.815667-1-boqun.feng@xxxxxxxxx
> > >
> > > "I have been called. I must answer. Always." ;-)
>
> Amazing! Thank you, Boqun!
>

Thank you for trying it out ;-)

> > It's missing an important testcase; if it passes (does not warn), then
> > it should work:
>
> I think it does.
>
> I started with kvm/master from
> https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=master so that
> lockdep didn't find other things to complain about first. I then:
>
> • Dropped the last two commits, putting us back to using kvm->lock and
> removing the dummy mutex lock that would have told lockdep that it's
> a (different) problem.
>
> • I then added Boqun's three commits
>
> • Reverted a79b53aa so that the srcu_synchronize() deadlock returns
>
> • Couldn't reproduce with xen_shinfo_test, so added Michal's test from
> https://lore.kernel.org/kvm/15599980-bd2e-b6c2-1479-e1eef02da0b5@xxxxxxx/
>
> The resulting tree is at
> https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/kvm-srcu-lockdep
>
>
> Now when I run tools/testing/selftests/kvm/x86_64/deadlocks_test I do
> see lockdep complain about it (shown below; I have a cosmetic
> nit/request to make). If I restore the evtchn_reset fix then it also
> complains about kvm_xen_set_evtchn() vs. kvm_xen_kvm_set_attr(), and if
> I restore the xen_lock fix from the tip of kvm/master then Michal's
> deadlock_test passes and there are no complaints.
>
> So everything seems to be working as it should... *except* for the fact
> that I don't quite understand why xen_shinfo_test didn't trigger the
> warning. Michal, I guess you already worked that out when you came up
> with your deadlock-test instead... is there something we should add to
> xen_shinfo_test that would mean it *would* have triggered? I even tried
> this:
>
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1173,6 +1173,16 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
> if (init_srcu_struct(&kvm->irq_srcu))
> goto out_err_no_irq_srcu;
>
> +#ifdef CONFIG_LOCKDEP
> + /*
> + * Ensure lockdep knows that it's not permitted to lock kvm->lock
> + * from a SRCU read section on kvm->srcu.
> + */
> + mutex_lock(&kvm->lock);
> + synchronize_srcu(&kvm->srcu);
> + mutex_unlock(&kvm->lock);
> +#endif
> +
> refcount_set(&kvm->users_count, 1);
> for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
> for (j = 0; j < 2; j++) {
>
>
>
>
> > [ 845.474169] ======================================================
> > [ 845.474170] WARNING: possible circular locking dependency detected
> > [ 845.474172] 6.2.0-rc3+ #1025 Tainted: G E
> > [ 845.474175] ------------------------------------------------------
> > [ 845.474176] deadlocks_test/22767 is trying to acquire lock:
> > [ 845.474178] ffffc9000ba4b868 (&kvm->srcu){.+.+}-{0:0}, at: __synchronize_srcu+0x5/0x170
> > [ 845.474192]
> > but task is already holding lock:
> > [ 845.474194] ffffc9000ba423c0 (&kvm->lock){+.+.}-{3:3}, at: kvm_vm_ioctl_set_msr_filter+0x188/0x220 [kvm]
> > [ 845.474319]
> > which lock already depends on the new lock.
>
> So the above part is good, and clearly tells me it was synchronize_srcu()
>
> > [ 845.474320]
> > the existing dependency chain (in reverse order) is:
> > [ 845.474322]
> > -> #1 (&kvm->lock){+.+.}-{3:3}:
> > [ 845.474327] __lock_acquire+0x4b4/0x940
> > [ 845.474333] lock_acquire.part.0+0xa8/0x210
> > [ 845.474337] __mutex_lock+0x94/0x920
> > [ 845.474344] kvm_xen_set_evtchn.part.0+0x6d/0x170 [kvm]
> > [ 845.474437] kvm_xen_inject_timer_irqs+0x79/0xa0 [kvm]
> > [ 845.474529] vcpu_run+0x20c/0x450 [kvm]
> > [ 845.474618] kvm_arch_vcpu_ioctl_run+0x1df/0x670 [kvm]
> > [ 845.474707] kvm_vcpu_ioctl+0x279/0x700 [kvm]
> > [ 845.474783] __x64_sys_ioctl+0x8a/0xc0
> > [ 845.474787] do_syscall_64+0x3b/0x90
> > [ 845.474796] entry_SYSCALL_64_after_hwframe+0x72/0xdc
> > [ 845.474804]
> > -> #0 (&kvm->srcu){.+.+}-{0:0}:
> > [ 845.474809] check_prev_add+0x8f/0xc20
> > [ 845.474812] validate_chain+0x3ba/0x450
> > [ 845.474814] __lock_acquire+0x4b4/0x940
> > [ 845.474817] lock_sync+0x99/0x110
> > [ 845.474820] __synchronize_srcu+0x4d/0x170
> > [ 845.474824] kvm_vm_ioctl_set_msr_filter+0x1a5/0x220 [kvm]
> . [ 845.474907] kvm_arch_vm_ioctl+0x8df/0xd50 [kvm]
> > [ 845.474997] kvm_vm_ioctl+0x5ca/0x800 [kvm]
> > [ 845.475075] __x64_sys_ioctl+0x8a/0xc0
> > [ 845.475079] do_syscall_64+0x3b/0x90
> > [ 845.475084] entry_SYSCALL_64_after_hwframe+0x72/0xdc
> > [ 845.475089]
> > other info that might help us debug this:
> >
> > [ 845.475091] Possible unsafe locking scenario:
> >
> > [ 845.475092] CPU0 CPU1
> > [ 845.475093] ---- ----
> > [ 845.475095] lock(&kvm->lock);
> > [ 845.475098] lock(&kvm->srcu);
> > [ 845.475101] lock(&kvm->lock);
> > [ 845.475103] lock(&kvm->srcu);
> > [ 845.475106]
> > *** DEADLOCK ***
>
> But is there any chance the above could say 'synchronize_srcu' and
> 'read_lock_srcu' in the appropriate places?
>

That requires some non-trivial rework of locking scenario printing, it's
in my TODO list...

That said, we can do some improvement on "CPU0", since when we print, we
have all the information for these two locks. I've done a POC at:

https://lore.kernel.org/lkml/20230113235722.1226525-1-boqun.feng@xxxxxxxxx

, which should improve the print a little. For example, the above
scenario will not be shown as:

[..] CPU0 CPU1
[..] ---- ----
[..] lock(&kvm->lock);
[..] lock(&kvm->srcu);
[..] lock(&kvm->lock);
[..] sync(&kvm->srcu);
[..]

Regards,
Boqun

> > [ 845.475108] 1 lock held by deadlocks_test/22767:
> > [ 845.475110] #0: ffffc9000ba423c0 (&kvm->lock){+.+.}-{3:3}, at: kvm_vm_ioctl_set_msr_filter+0x188/0x220 [kvm]
> > [ 845.475200]
> > stack backtrace:
> > [ 845.475202] CPU: 10 PID: 22767 Comm: deadlocks_test Tainted: G E 6.2.0-rc3+ #1025
> > [ 845.475206] Hardware name: Intel Corporation S2600CW/S2600CW, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015
> > [ 845.475208] Call Trace:
> > [ 845.475210] <TASK>
> > [ 845.475214] dump_stack_lvl+0x56/0x73
> > [ 845.475221] check_noncircular+0x102/0x120
> > [ 845.475229] ? check_noncircular+0x7f/0x120
> > [ 845.475236] check_prev_add+0x8f/0xc20
> > [ 845.475239] ? add_chain_cache+0x10b/0x2d0
> > [ 845.475244] validate_chain+0x3ba/0x450
> > [ 845.475249] __lock_acquire+0x4b4/0x940
> > [ 845.475253] ? __synchronize_srcu+0x5/0x170
> > [ 845.475258] lock_sync+0x99/0x110
> > [ 845.475261] ? __synchronize_srcu+0x5/0x170
> > [ 845.475265] __synchronize_srcu+0x4d/0x170
> ? [ 845.475269] ? mark_held_locks+0x49/0x80
> > [ 845.475272] ? _raw_spin_unlock_irqrestore+0x2d/0x60
> > [ 845.475278] ? __pfx_read_tsc+0x10/0x10
> > [ 845.475286] ? ktime_get_mono_fast_ns+0x3d/0x90
> > [ 845.475292] kvm_vm_ioctl_set_msr_filter+0x1a5/0x220 [kvm]
> > [ 845.475380] kvm_arch_vm_ioctl+0x8df/0xd50 [kvm]
> > [ 845.475472] ? __lock_acquire+0x4b4/0x940
> > [ 845.475485] kvm_vm_ioctl+0x5ca/0x800 [kvm]
> > [ 845.475566] ? lockdep_unregister_key+0x76/0x110
> > [ 845.475575] __x64_sys_ioctl+0x8a/0xc0
> > [ 845.475579] do_syscall_64+0x3b/0x90
> > [ 845.475586] entry_SYSCALL_64_after_hwframe+0x72/0xdc
> > [ 845.475591] RIP: 0033:0x7f79de23fd1b
> > [ 845.475595] Code: 73 01 c3 48 8b 0d 05 a1 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 a0 1b 00 f7 d8 64 89 01 48
> > [ 845.475598] RSP: 002b:00007f79ddff7c98 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> > [ 845.475602] RAX: ffffffffffffffda RBX: 00007f79ddff8640 RCX: 00007f79de23fd1b
> > [ 845.475605] RDX: 00007f79ddff7ca0 RSI: 000000004188aec6 RDI: 0000000000000004
> > [ 845.475607] RBP: 00007f79ddff85c0 R08: 0000000000000000 R09: 00007fffceb1ff2f
> > [ 845.475609] R10: 0000000000000008 R11: 0000000000000246 R12: 00007f79ddff7ca0
> > [ 845.475611] R13: 0000000001c322a0 R14: 00007f79de2a05f0 R15: 0000000000000000
> > [ 845.475617] </TASK>
>
>