Re: [PATCH] KVM: SVM: Disable TDP MMU when running on Hyper-V

From: Jeremi Piotrowski
Date: Wed Apr 05 2023 - 12:43:55 EST


On 3/7/2023 6:36 PM, Sean Christopherson wrote:
> Thinking about this more, I would rather revert commit 1e0c7d40758b ("KVM: SVM:
> hyper-v: Remote TLB flush for SVM") or fix the thing properly straitaway. KVM
> doesn't magically handle the flushes correctly for the shadow/legacy MMU, KVM just
> happens to get lucky and not run afoul of the underlying bugs. The revert appears
> to be reasonably straightforward (see bottom).

Hi Sean,

I'm back, and I don't have good news. The fix for the missing hyperv TLB flushes has
landed in Linus' tree and I now had the chance to test things outside Azure, in WSL on my
AMD laptop.

There is some seriously weird interaction going on between TDP MMU and Hyper-V, with
or without enlightened TLB. My laptop has 16 vCPUs, so the WSL VM also has 16 vCPUs.
I have hardcoded the kernel to disable enlightened TLB (so we know that is not interfering).
I'm running a Flatcar Linux VM inside the WSL VM using legacy BIOS, a single CPU
and 4GB of RAM.

If I run with `kvm.tdp_mmu=0`, I can boot and shutdown my VM consistently in 20 seconds.

If I run with TDP MMU, the VM boot stalls for seconds at a time in various spots
(loading grub, decompressing kernel, during kernel boot), the boot output feels like
it's happening in slow motion. The fastest I see it finish the same cycle is 2 minutes,
I have also seen it take 4 minutes, sometimes even not finish at all. Same everything,
the only difference is the value of `kvm.tdp_mmu`.

So I would like to revisit disabling tdp_mmu on hyperv altogether for the time being but it
should probably be with the following condition:

tdp_mmu_enabled = tdp_mmu_allowed && tdp_enabled && !hypervisor_is_type(X86_HYPER_MS_HYPERV)

Do you have an environment where you would be able to reproduce this? A Windows server perhaps
or an AMD laptop?

Jeremi

>
> And _if_ we want to hack-a-fix it, then I would strongly prefer a very isolated,
> obviously hacky fix, e.g.
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 36e4561554ca..a9ba4ae14fda 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5779,8 +5779,13 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
> tdp_root_level = tdp_forced_root_level;
> max_tdp_level = tdp_max_root_level;
>
> + /*
> + * FIXME: Remove the enlightened TLB restriction when KVM properly
> + * handles TLB flushes for said enlightenment.
> + */.
> #ifdef CONFIG_X86_64
> - tdp_mmu_enabled = tdp_mmu_allowed && tdp_enabled;
> + tdp_mmu_enabled = tdp_mmu_allowed && tdp_enabled &&
> + !(ms_hyperv.nested_features & HV_X64_NESTED_ENLIGHTENED_TLB);
> #endif
> /*
> * max_huge_page_level reflects KVM's MMU capabilities irrespective
>
>