Re: [patch] KVM: SVM: Periodically schedule when unregistering regions on destroy

From: David Rientjes
Date: Fri Sep 11 2020 - 03:57:48 EST


Paolo, ping?

On Tue, 25 Aug 2020, David Rientjes wrote:

> There may be many encrypted regions that need to be unregistered when a
> SEV VM is destroyed. This can lead to soft lockups. For example, on a
> host running 4.15:
>
> watchdog: BUG: soft lockup - CPU#206 stuck for 11s! [t_virtual_machi:194348]
> CPU: 206 PID: 194348 Comm: t_virtual_machi
> RIP: 0010:free_unref_page_list+0x105/0x170
> ...
> Call Trace:
> [<0>] release_pages+0x159/0x3d0
> [<0>] sev_unpin_memory+0x2c/0x50 [kvm_amd]
> [<0>] __unregister_enc_region_locked+0x2f/0x70 [kvm_amd]
> [<0>] svm_vm_destroy+0xa9/0x200 [kvm_amd]
> [<0>] kvm_arch_destroy_vm+0x47/0x200
> [<0>] kvm_put_kvm+0x1a8/0x2f0
> [<0>] kvm_vm_release+0x25/0x30
> [<0>] do_exit+0x335/0xc10
> [<0>] do_group_exit+0x3f/0xa0
> [<0>] get_signal+0x1bc/0x670
> [<0>] do_signal+0x31/0x130
>
> Although the CLFLUSH is no longer issued on every encrypted region to be
> unregistered, there are no other changes that can prevent soft lockups for
> very large SEV VMs in the latest kernel.
>
> Periodically schedule if necessary. This still holds kvm->lock across the
> resched, but since this only happens when the VM is destroyed this is
> assumed to be acceptable.
>
> Signed-off-by: David Rientjes <rientjes@xxxxxxxxxx>
> ---
> arch/x86/kvm/svm/sev.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -1106,6 +1106,7 @@ void sev_vm_destroy(struct kvm *kvm)
> list_for_each_safe(pos, q, head) {
> __unregister_enc_region_locked(kvm,
> list_entry(pos, struct enc_region, list));
> + cond_resched();
> }
> }
>
>