Re: [PATCH 3/5] KVM: Conditionally reschedule when resetting the dirty ring

From: Yan Zhao
Date: Mon Jan 13 2025 - 02:05:13 EST


On Fri, Jan 10, 2025 at 05:04:07PM -0800, Sean Christopherson wrote:
> When resetting a dirty ring, conditionally reschedule on each iteration
> after the first. The recently introduced hard limit mitigates the issue
> of an endless reset, but isn't sufficient to completely prevent RCU
> stalls, soft lockups, etc., nor is the hard limit intended to guard
> against such badness.
>
> Note! Take care to check for reschedule even in the "continue" paths,
> as a pathological scenario (or malicious userspace) could dirty the same
> gfn over and over, i.e. always hit the continue path.
>
> rcu: INFO: rcu_sched self-detected stall on CPU
> rcu: 4-....: (5249 ticks this GP) idle=51e4/1/0x4000000000000000 softirq=309/309 fqs=2563
> rcu: (t=5250 jiffies g=-319 q=608 ncpus=24)
> CPU: 4 UID: 1000 PID: 1067 Comm: dirty_log_test Tainted: G L 6.13.0-rc3-17fa7a24ea1e-HEAD-vm #814
> Tainted: [L]=SOFTLOCKUP
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> RIP: 0010:kvm_arch_mmu_enable_log_dirty_pt_masked+0x26/0x200 [kvm]
> Call Trace:
> <TASK>
> kvm_reset_dirty_gfn.part.0+0xb4/0xe0 [kvm]
> kvm_dirty_ring_reset+0x58/0x220 [kvm]
> kvm_vm_ioctl+0x10eb/0x15d0 [kvm]
> __x64_sys_ioctl+0x8b/0xb0
> do_syscall_64+0x5b/0x160
> entry_SYSCALL_64_after_hwframe+0x4b/0x53
> </TASK>
> Tainted: [L]=SOFTLOCKUP
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> RIP: 0010:kvm_arch_mmu_enable_log_dirty_pt_masked+0x17/0x200 [kvm]
> Call Trace:
> <TASK>
> kvm_reset_dirty_gfn.part.0+0xb4/0xe0 [kvm]
> kvm_dirty_ring_reset+0x58/0x220 [kvm]
> kvm_vm_ioctl+0x10eb/0x15d0 [kvm]
> __x64_sys_ioctl+0x8b/0xb0
> do_syscall_64+0x5b/0x160
> entry_SYSCALL_64_after_hwframe+0x4b/0x53
> </TASK>
>
> Fixes: fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking")
> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> ---
> virt/kvm/dirty_ring.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> index a81ad17d5eef..37eb2b7142bd 100644
> --- a/virt/kvm/dirty_ring.c
> +++ b/virt/kvm/dirty_ring.c
> @@ -133,6 +133,16 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
>
> ring->reset_index++;
> (*nr_entries_reset)++;
> +
> + /*
> + * While the size of each ring is fixed, it's possible for the
> + * ring to be constantly re-dirtied/harvested while the reset
> + * is in-progress (the hard limit exists only to guard against
> + * wrapping the count into negative space).
> + */
> + if (!first_round)
> + cond_resched();
> +
Will cond_resched() per entry be too frequent?
Could we combine the cond_resched() per ring? e.g.

if (count >= ring->soft_limit)
cond_resched();

or simply
while (count < ring->size) {
...
}


> /*
> * Try to coalesce the reset operations when the guest is
> * scanning pages in the same slot.
> --
> 2.47.1.613.gc27f4b7a9f-goog
>