Re: REGRESSION: lockdep warning triggered by 15b9ce7ecd: virtio_balloon: stay awake while adjusting balloon

From: David Stevens
Date: Tue Jan 09 2024 - 00:50:46 EST


On Tue, Jan 9, 2024 at 6:50 AM Theodore Ts'o <tytso@xxxxxxx> wrote:
>
> Hi, while doing final testing before sending a pull request, I merged
> in linux-next, and commit 5b9ce7ecd7: virtio_balloon: stay awake while
> adjusting balloon seems to be causing a lockdep warning (see attached)
> when running gce-xfstests on a Google Compute Engine e2 VM. I was not
> able to trigger it using kvm-xfstests, but the following command:
> "gce-xfstests -C 10 ext4/4k generic/476) was sufficient to triger the
> problem. For more information please see [1] and [2].
>
> [1] https://github.com/tytso/xfstests-bld/blob/master/Documentation/gce-xfstests.md
> [2] https://thunk.org/gce-xfstests
>
> I found it by looking at the git logs, and this commit aroused my
> suspicions, and I further testing showed that the lockdep warning was
> reproducible with this commit, but not when testing with the
> immediately preceeding commit (15b9ce7ecd^).
>
> Cheers,
>
> - Ted
>
>
> root: ext4/4k run xfstest generic/476
> systemd[1]: Started fstests-generic-476.scope - /usr/bin/bash -c test -w /proc/self/oom_score_adj && echo 250 > /proc/self/oom_score_adj; exec ./tests/generic/476.
> kernel: [ 399.361181] EXT4-fs (dm-1): mounted filesystem 840e25bd-f650-4819-8562-7eded85ef370 r/w with ordered data mode. Quota mode: none.
> systemd[1]: fstests-generic-476.scope: Deactivated successfully.
> systemd[1]: fstests-generic-476.scope: Consumed 3min 1.966s CPU time.
> systemd[1]: xt\x2dvdb.mount: Deactivated successfully.
> kernel: [ 537.085404] EXT4-fs (dm-0): unmounting filesystem d3d7a675-f7b6-4384-abec-2e60d885b6da.
> systemd[1]: xt\x2dvdc.mount: Deactivated successfully.
> kernel: [ 540.565870]
> kernel: [ 540.567523] ================================
> kernel: [ 540.572007] WARNING: inconsistent lock state
> kernel: [ 540.576407] 6.7.0-rc3-xfstests-lockdep-00012-g5b9ce7ecd715 #318 Not tainted
> kernel: [ 540.583532] --------------------------------
> kernel: [ 540.587928] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> kernel: [ 540.594326] kworker/0:3/329 [HC0[0]:SC0[0]:HE1:SE1] takes:
> kernel: [ 540.599955] ffff90b280a548c0 (&vb->adjustment_lock){?...}-{2:2}, at: update_balloon_size_func+0x33/0x190
> kernel: [ 540.609926] {IN-HARDIRQ-W} state was registered at:
> kernel: [ 540.614935] __lock_acquire+0x3f2/0xb30
> kernel: [ 540.618992] lock_acquire+0xbf/0x2b0
> kernel: [ 540.622786] _raw_spin_lock_irqsave+0x43/0x90
> kernel: [ 540.627366] virtballoon_changed+0x51/0xd0
> kernel: [ 540.631947] virtio_config_changed+0x5a/0x70
> kernel: [ 540.636437] vp_config_changed+0x11/0x20
> kernel: [ 540.640576] __handle_irq_event_percpu+0x88/0x230
> kernel: [ 540.645500] handle_irq_event+0x38/0x80
> kernel: [ 540.649558] handle_edge_irq+0x8f/0x1f0
> kernel: [ 540.653791] __common_interrupt+0x47/0xf0
> kernel: [ 540.658106] common_interrupt+0x79/0xa0
> kernel: [ 540.661672] EXT4-fs (dm-1): unmounting filesystem 840e25bd-f650-4819-8562-7eded85ef370.
> kernel: [ 540.663183] asm_common_interrupt+0x26/0x40
> kernel: [ 540.663190] acpi_safe_halt+0x1b/0x30
> kernel: [ 540.663196] acpi_idle_enter+0x7b/0xd0
> kernel: [ 540.663199] cpuidle_enter_state+0x90/0x4f0
> kernel: [ 540.688723] cpuidle_enter+0x2d/0x40
> kernel: [ 540.692516] cpuidle_idle_call+0xe4/0x120
> kernel: [ 540.697036] do_idle+0x84/0xd0
> kernel: [ 540.700393] cpu_startup_entry+0x2a/0x30
> kernel: [ 540.704588] rest_init+0xe9/0x180
> kernel: [ 540.708118] arch_call_rest_init+0xe/0x30
> kernel: [ 540.712426] start_kernel+0x41c/0x4b0
> kernel: [ 540.716310] x86_64_start_reservations+0x18/0x30
> kernel: [ 540.721164] x86_64_start_kernel+0x8c/0x90
> kernel: [ 540.725737] secondary_startup_64_no_verify+0x178/0x17b
> kernel: [ 540.731432] irq event stamp: 22681
> kernel: [ 540.734956] hardirqs last enabled at (22681): [<ffffffff8b4b5158>] _raw_spin_unlock_irq+0x28/0x50
> kernel: [ 540.744564] hardirqs last disabled at (22680): [<ffffffff8b4b4ded>] _raw_spin_lock_irq+0x5d/0x90
> kernel: [ 540.753475] softirqs last enabled at (22076): [<ffffffff8a58cfa1>] srcu_invoke_callbacks+0x101/0x1c0
> kernel: [ 540.762904] softirqs last disabled at (22072): [<ffffffff8a58cfa1>] srcu_invoke_callbacks+0x101/0x1c0
> kernel: [ 540.773298]
> kernel: [ 540.773298] other info that might help us debug this:
> kernel: [ 540.780207] Possible unsafe locking scenario:
> kernel: [ 540.780207]
> kernel: [ 540.786438] CPU0
> kernel: [ 540.789007] ----
> kernel: [ 540.791766] lock(&vb->adjustment_lock);
> kernel: [ 540.796014] <Interrupt>
> kernel: [ 540.798778] lock(&vb->adjustment_lock);
> kernel: [ 540.803605]

Oh, that's embarrassing, I completely whiffed on interactions with
interrupts. The following patch fixes it, and I've locally repro'ed
the issue and verified the fix. What's the process for getting this
fix merged? Does it get merged as a seperatch patch, or squashed into
the original commit?