Re: [PATCH 4/4] X86: Use KVM CR pin MSRs

From: Andersen, John
Date: Fri Jul 03 2020 - 17:51:50 EST


> > Is there a plan for fixing this for real? I'm wondering if there is a
> > sane weakening of this feature that still allows things like kexec.
> >
>
> I'm pretty sure kexec can be fixed. I had it working at one point, I'm
> currently in the process of revalidating this. The issue was though that
> kexec only worked within the guest, not on the physical host, which I suspect
> is related to the need for supervisor pages to be mapped, which seems to be
> required before enabling SMAP (based on what I'd seen with the selftests and
> unittests). I was also just blindly turning on the bits without checking for
> support when I'd tried this, so that could have been the issue too.
>
> I think most of the changes for just blindly enabling the bits were in
> relocate_kernel, secondary_startup_64, and startup_32.
>

So I have a naive fix for kexec which has only been tested to work under KVM.
When tested on a physical host, it did not boot when SMAP or UMIP were set.
Undoubtedly it's not the correct way to do this, as it skips CPU feature
identification, opting instead for blindly setting the bits. The physical host
I tested this on does not have UMIP so that's likely why it failed to boot when
UMIP gets set blindly. Within kvm-unit-tests, the test for SMAP maps memory as
supervisor pages before enabling SMAP. I suspect this is why setting SMAP
blindly causes the physical host not to boot.

Within trampoline_32bit_src() if I add more instructions I get an error
about "attempt to move .org backwards", which as I understand it means
there are only so many instructions allowed in each of those functions.

My suspicion is that someone with more knowledge of this area has a good
idea on how best to handle this. Feedback would be much appreciated.

> > There's no SMEP or SMAP in real mode, and real mode has basically no security
> > mitigations at all.
> >
>
> We'd thought about the switch to real mode being a case where we'd want to drop
> pinning. However, we weren't sure how much weaker, if at all, it makes this
> protection.
>
> Unless someone knows, I'll probably need to do some digging into what an
> exploit might look like that tries switching to real mode and switching back as
> a way around this protection.
>

TL;DR We probably shouldn't use the switch to real mode as a trigger to drop
pinning.

This protection assumes that the attacker is at the point where they have the
ability to write a payload for a ROP/JOP attack and gain control of execution.

For this case where we are going to switch to real mode we need to add an
assumption that the attacker has a write primitive that allows them to write
part of their payload to memory that will be addressable within 16 bit mode.

If the attacker has this write primitive, the attack becomes write payloads,
within the first stage, switch to real mode, use stage two within real mode via
JOP or just machine code (since there's we don't have to worry NX) to setup
protected mode and jump back into the kernel with protections disabled.

> > PCID is an odd case. I see no good reason to pin it, and pinning PCID
> > on prevents use of 32-bit mode.
>
> Maybe it makes sense to default to the values we have, but allow host userspace
> to overwrite the allowed values, in case some other guest OS wants to do
> something that Linux doesn't with PCID or other bits.

In the next version of this patchset I've made it so that the default allowed
values are WP, SMEP, SMAP, and UMIP. However, a write to the allowed MSR from
the host VMM (QEMU) can change which bits are allowed.