Re: [PATCH] KVM: x86/mmu: Do not create SPTEs for GFNs that exceed host.MAXPHYADDR
From: Maxim Levitsky
Date: Wed May 04 2022 - 08:08:53 EST
On Tue, 2022-05-03 at 20:30 +0000, Sean Christopherson wrote:
> On Tue, May 03, 2022, Maxim Levitsky wrote:
> > On Tue, 2022-05-03 at 12:12 +0300, Maxim Levitsky wrote:
> > > On Mon, 2022-05-02 at 16:51 +0000, Sean Christopherson wrote:
> > > > On Mon, May 02, 2022, Maxim Levitsky wrote:
> > > > > On Mon, 2022-05-02 at 10:59 +0300, Maxim Levitsky wrote:
> > > > > > > > Also I can reproduce it all the way to 5.14 kernel (last kernel I have installed in this VM).
> > > > > > > >
> > > > > > > > I tested kvm/queue as of today, sadly I still see the warning.
> > > > > > >
> > > > > > > Due to a race, the above statements are out of order ;-)
> > > > > >
> > > > > > So futher investigation shows that the trigger for this *is* cpu_pm=on :(
> > > > > >
> > > > > > So this is enough to trigger the warning when run in the guest:
> > > > > >
> > > > > > qemu-system-x86_64 -nodefaults -vnc none -serial stdio -machine accel=kvm
> > > > > > -kernel x86/dummy.flat -machine kernel-irqchip=on -smp 8 -m 1g -cpu host
> > > > > > -overcommit cpu-pm=on
>
> ...
>
> > > > > All right, at least that was because I removed the '-device isa-debug-exit,iobase=0xf4,iosize=0x4',
> > > > > which is apparently used by KVM unit tests to signal exit from the VM.
> > > >
> > > > Can you provide your QEMU command line for running your L1 VM? And your L0 and L1
> > > > Kconfigs too? I've tried both the dummy and ipi_stress tests on a variety of hardware,
> > > > kernels, QEMUs, etc..., with no luck.
> > >
> > > So now both L0 and L1 run almost pure kvm/queue)
> > > (commit 2764011106d0436cb44702cfb0981339d68c3509)
> > >
> > > I have some local patches but they are not relevant to KVM at all, more
> > > like various tweaks to sensors, a sad hack for yet another regression
> > > in AMDGPU, etc.
> > >
> > > The config and qemu command line attached.
> > >
> > > AVIC disabled in L0, L0 qemu is from master upstream.
> > > Bug reproduces too well IMHO, almost always.
> > >
> > > For reference the warning is printed in L1's dmesg.
> >
> > Tested this without any preemption in L0 and L1 - bug still reproduces just fine.
> > (kvm/queue)
>
> Well, I officially give up, I'm out of ideas to try and repro this on my end. To
> try and narrow the search, maybe try processing "all" possible gfns and see if that
> makes the leak go away?
>
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 7e258cc94152..a354490939ec 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -84,9 +84,7 @@ static inline gfn_t kvm_mmu_max_gfn(void)
> * than hardware's real MAXPHYADDR. Using the host MAXPHYADDR
> * disallows such SPTEs entirely and simplifies the TDP MMU.
> */
> - int max_gpa_bits = likely(tdp_enabled) ? shadow_phys_bits : 52;
> -
> - return (1ULL << (max_gpa_bits - PAGE_SHIFT)) - 1;
> + return (1ULL << (52 - PAGE_SHIFT)) - 1;
> }
>
> static inline u8 kvm_get_shadow_phys_bits(void)
>
Nope, still reproduces.
I'll think on how to trace this, maybe that will give me some ideas.
Anything useful to dump from the mmu pages that are still not freed at that point?
Also do you test on AMD? I test on my 3970X.
Best regards,
Maxim Levitsky