Re: [RESEND RFC 0/2] Paravirtualized Control Register pinning
From: Paolo Bonzini
Date: Sat Dec 21 2019 - 09:00:01 EST
On 20/12/19 20:26, John Andersen wrote:
> Paravirtualized CR pinning will likely be incompatible with kexec for
> the foreseeable future. Early boot code could possibly be changed to
> not clear protected bits. However, a kernel that requests CR bits be
> pinned can't know if the kernel it's kexecing has been updated to not
> clear protected bits. This would result in the kernel being kexec'd
> almost immediately receiving a general protection fault.
>
> Security conscious kernel configurations disable kexec already, per KSPP
> guidelines. Projects such as Kata Containers, AWS Lambda, ChromeOS
> Termina, and others using KVM to virtualize Linux will benefit from
> this protection.
>
> The usage of SMM in SeaBIOS was explored as a way to communicate to KVM
> that a reboot has occurred and it should zero the pinned bits. When
> using QEMU and SeaBIOS, SMM initialization occurs on reboot. However,
> prior to SMM initialization, BIOS writes zero values to CR0, causing a
> general protection fault to be sent to the guest before SMM can signal
> that the machine has booted.
SMM is optional; I think it makes sense to leave it to userspace to
reset pinning (including for the case of triple faults), while INIT
which is handled within KVM would keep it active.
> Pinning of sensitive CR bits has already been implemented to protect
> against exploits directly calling native_write_cr*(). The current
> protection cannot stop ROP attacks which jump directly to a MOV CR
> instruction. Guests running with paravirtualized CR pinning are now
> protected against the use of ROP to disable CR bits. The same bits that
> are being pinned natively may be pinned via the CR pinned MSRs. These
> bits are WP in CR0, and SMEP, SMAP, and UMIP in CR4.
>
> Future patches could protect bits in MSRs in a similar fashion. The NXE
> bit of the EFER MSR is a prime candidate.
Please include patches for either kvm-unit-tests or
tools/testing/selftests/kvm that test the functionality.
Paolo