Re: [PATCH 1/2] KVM: nVMX: fix CR4_READ_SHADOW when L0 updates CR4 during a signal
From: Thomas Prescher
Date: Tue Apr 16 2024 - 11:08:50 EST
Hi Sean,
On Tue, 2024-04-16 at 07:35 -0700, Sean Christopherson wrote:
> On Tue, Apr 16, 2024, Julian Stecklina wrote:
> > From: Thomas Prescher <thomas.prescher@xxxxxxxxxxxxxxxxxxxxx>
> >
> > This issue occurs when the kernel is interrupted by a signal while
> > running a L2 guest. If the signal is meant to be delivered to the
> > L0
> > VMM, and L0 updates CR4 for L1, i.e. when the VMM sets
> > KVM_SYNC_X86_SREGS in kvm_run->kvm_dirty_regs, the kernel programs
> > an
> > incorrect read shadow value for L2's CR4.
> >
> > The result is that the guest can read a value for CR4 where bits
> > from
> > L1 have leaked into L2.
>
> No, this is a userspace bug. If L2 is active when userspace stuffs
> register state,
> then from KVM's perspective the incoming value is L2's value. E.g.
> if userspace
> *wants* to update L2 CR4 for whatever reason, this patch would result
> in L2 getting
> a stale value, i.e. the value of CR4 at the time of VM-Enter.
>
> And even if userspace wants to change L1, this patch is wrong, as KVM
> is writing
> vmcs02.GUEST_CR4, i.e. is clobbering the L2 CR4 that was programmed
> by L1, *and*
> is dropping the CR4 value that userspace wanted to stuff for L1.
>
> To fix this, your userspace needs to either wait until L2 isn't
> active, or force
> the vCPU out of L2 (which isn't easy, but it's doable if absolutely
> necessary).
What you say makes sense. Is there any way for
userspace to detect whether L2 is currently active after
returning from KVM_RUN? I couldn't find anything in the official
documentation https://docs.kernel.org/virt/kvm/api.html
Can you point me into the right direction?
>
> Pulling in a snippet from the initial bug report[*],
>
> : The reason why this triggers in VirtualBox and not in Qemu is that
> there are
> : cases where VirtualBox marks CR4 dirty even though it hasn't
> changed.
>
> simply not trying to stuff register state dirty when L2 is active
> sounds like it
> would resolve the issue.
>
> https://lore.kernel.org/all/af2ede328efee9dc3761333bd47648ee6f752686.camel@xxxxxxxxxxxxxxxxxxxxx
>
> > We found this issue by running uXen [1] as L2 in VirtualBox/KVM
> > [2].
> > The issue can also easily be reproduced in Qemu/KVM if we force a
> > sreg
> > sync on each call to KVM_RUN [3]. The issue can also be reproduced
> > by
> > running a L2 Windows 10. In the Windows case, CR4.VMXE leaks from
> > L1
> > to L2 causing the OS to blue-screen with a kernel thread exception
> > during TLB invalidation where the following code sequence triggers
> > the
> > issue:
> >
> > mov rax, cr4 <--- L2 reads CR4 with contents from L1
> > mov rcx, cr4
> > btc 0x7, rax <--- L2 toggles CR4.PGE
> > mov cr4, rax <--- #GP because L2 writes CR4 with reserved bits set
> > mov cr4, rcx
> >
> > The existing code seems to fixup CR4_READ_SHADOW after calling
> > vmx_set_cr4 except in __set_sregs_common. While we could fix it
> > there
> > as well, it's easier to just handle it centrally.
> >
> > There might be a similar issue with CR0.
> >
> > [1] https://github.com/OpenXT/uxen
> > [2] https://github.com/cyberus-technology/virtualbox-kvm
> > [3]
> > https://github.com/tpressure/qemu/commit/d64c9d5e76f3f3b747bea7653d677bd61e13aafe
> >
> > Signed-off-by: Julian Stecklina
> > <julian.stecklina@xxxxxxxxxxxxxxxxxxxxx>
> > Signed-off-by: Thomas Prescher
> > <thomas.prescher@xxxxxxxxxxxxxxxxxxxxx>
>
> SoB is reversed, yours should come after Thomas'.
>
> > ---
> > arch/x86/kvm/vmx/vmx.c | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 6780313914f8..0d4af00245f3 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -3474,7 +3474,11 @@ void vmx_set_cr4(struct kvm_vcpu *vcpu,
> > unsigned long cr4)
> > hw_cr4 &= ~(X86_CR4_SMEP | X86_CR4_SMAP |
> > X86_CR4_PKE);
> > }
> >
> > - vmcs_writel(CR4_READ_SHADOW, cr4);
> > + if (is_guest_mode(vcpu))
> > + vmcs_writel(CR4_READ_SHADOW,
> > nested_read_cr4(get_vmcs12(vcpu)));
> > + else
> > + vmcs_writel(CR4_READ_SHADOW, cr4);
> > +
> > vmcs_writel(GUEST_CR4, hw_cr4);
> >
> > if ((cr4 ^ old_cr4) & (X86_CR4_OSXSAVE | X86_CR4_PKE))
> > --
> > 2.43.2
> >