Re: [RESEND RFC 0/2] Paravirtualized Control Register pinning

From: Andersen, John S
Date: Tue Dec 24 2019 - 14:45:10 EST


On Mon, 2019-12-23 at 16:48 +0200, Liran Alon wrote:
> > On 20 Dec 2019, at 21:26, John Andersen <john.s.andersen@xxxxxxxxx>
> > wrote:
> >
> > Pinning is not active when running in SMM. Entering SMM disables
> > pinned
> > bits, writes to control registers within SMM would therefore
> > trigger
> > general protection faults if pinning was enforced.
>
> For compatibility reasons, itâs reasonable that pinning wonât be
> active when running in SMM.
> However, I do think we should not allow vSMM code to change pinned
> values when returning back from SMM.
> This would prevent a vulnerable vSMI handler from modifying vSMM
> state-area to modify CR4 when running outside of vSMM.
> I believe in this case itâs legit to just forcibly restore original
> CR0/CR4 pinned values. Ignoring vSMM changes.
>

In em_rsm could we just OR with the value of the PINNED MSRs right
before the final return?

> > The guest may never read pinned bits. If an attacker were to read
> > the
> > CR pinned MSRs, they might decide to preform another attack which
> > would
> > not cause a general protection fault.
>
> I disagree with this statement.
> An attacker knows what is the system it is attacking and can deduce
> by that which bits it pinnedâ
> Therefore, protecting from guest reading these is not important at
> all.
>

Sure, I'll make it readable.

> > Should userspace expose the CR pining CPUID feature bit, it must
> > zero CR
> > pinned MSRs on reboot. If it does not, it runs the risk of having
> > the
> > guest enable pinning and subsequently cause general protection
> > faults on
> > next boot due to early boot code setting control registers to
> > values
> > which do not contain the pinned bits.
>
> Why reset CR pinned MSRs by userspace instead of KVM INIT handling?
>
> > When running with KVM guest support and paravirtualized CR pinning
> > enabled, paravirtualized and existing pinning are setup at the same
> > point on the boot CPU. Non-boot CPUs setup pinning upon
> > identification.
> >
> > Guests using the kexec system call currently do not support
> > paravirtualized control register pinning. This is due to early boot
> > code writing known good values to control registers, these values
> > do
> > not contain the protected bits. This is due to CPU feature
> > identification being done at a later time, when the kernel properly
> > checks if it can enable protections.
> >
> > Most distributions enable kexec. However, kexec could be made boot
> > time
> > disableable. In this case if a user has disabled kexec at boot time
> > the guest will request that paravirtualized control register
> > pinning
> > be enabled. This would expand the userbase to users of major
> > distributions.
> >
> > Paravirtualized CR pinning will likely be incompatible with kexec
> > for
> > the foreseeable future. Early boot code could possibly be changed
> > to
> > not clear protected bits. However, a kernel that requests CR bits
> > be
> > pinned can't know if the kernel it's kexecing has been updated to
> > not
> > clear protected bits. This would result in the kernel being kexec'd
> > almost immediately receiving a general protection fault.
>
> Instead of disabling kexec entirely, I think it makes more sense to
> invent
> some generic mechanism in which new kernel can describe to old kernel
> a set of flags that specifies which features hand-over it supports.
> One of them
> being pinned CRs.
>
> For example, isnât this also relevant for IOMMU DMA protection?
> i.e. Doesnât old kernel need to know if it should disable or enable
> IOMMU DMAR
> before kexec to new kernel? Similar to EDK2 IOMMU DMA protection
> hand-over?

Great idea.

Making kexec work will require changes to these files and maybe more:

arch/x86/boot/compressed/head_64.S
arch/x86/kernel/head_64.S
arch/x86/kernel/relocate_kernel_64.S

Which my previous attempts showed different results when running
virtualized vs. unvirtualized. Specificity different behavior with SMAP
and UMIP bits.

This would be a longer process though. As validating that everything
still works in both the VM and on physical hosts will be required. As
it stands this patchset could pick up a fairly large userbase via the
virtualized container projects. Should we pursue kexec in this patchset
or a later one?

Thanks,
John