Re: [PATCH 2/2] KVM: VMX: Extend VMX's #AC handding

From: Xiaoyao Li
Date: Sat Feb 01 2020 - 23:34:07 EST

Next message: Randy Dunlap: "Re: Latest Git kernel: avahi-daemon[2410]: ioctl(): Inappropriate ioctl for device"
Previous message: Brian Geffon: "Re: [PATCH v2] mm: Add MREMAP_DONTUNMAP to mremap()."
In reply to: Andy Lutomirski: "Re: [PATCH 2/2] KVM: VMX: Extend VMX's #AC handding"
Next in thread: Andy Lutomirski: "Re: [PATCH 2/2] KVM: VMX: Extend VMX's #AC handding"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2/2/2020 1:56 AM, Andy Lutomirski wrote:

On Feb 1, 2020, at 8:58 AM, Xiaoyao Li <xiaoyao.li@xxxxxxxxx> wrote:

ïOn 2/1/2020 5:33 AM, Andy Lutomirski wrote:

On Jan 31, 2020, at 1:04 PM, Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote:

ïOn Fri, Jan 31, 2020 at 12:57:51PM -0800, Andy Lutomirski wrote:

On Jan 31, 2020, at 12:18 PM, Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote:

This is essentially what I proposed a while back. KVM would allow enabling
split-lock #AC in the guest if and only if SMT is disabled or the enable bit
is per-thread, *or* the host is in "warn" mode (can live with split-lock #AC
being randomly disabled/enabled) and userspace has communicated to KVM that
it is pinning vCPUs.

How about covering the actual sensible case: host is set to fatal? In this
mode, the guest gets split lock detection whether it wants it or not. How do
we communicate this to the guest?

KVM doesn't advertise split-lock #AC to the guest and returns -EFAULT to the
userspace VMM if the guest triggers a split-lock #AC.

Effectively the same behavior as any other userspace process, just that KVM
explicitly returns -EFAULT instead of the process getting a SIGBUS.

Which helps how if the guest is actually SLD-aware?
I suppose we could make the argument that, if an SLD-aware guest gets #AC at CPL0, itâs a bug, but it still seems rather nicer to forward the #AC to the guest instead of summarily killing it.

If KVM does advertise split-lock detection to the guest, then kvm/host can know whether a guest is SLD-aware by checking guest's MSR_TEST_CTRL.SPLIT_LOCK_DETECT bit.

- If guest's MSR_TEST_CTRL.SPLIT_LOCK_DETECT is set, it indicates guest is SLD-aware so KVM forwards #AC to guest.

I disagree. If you advertise split-lock detection with the current core capability bit, it should *work*. And it wonât. The choices youâre actually giving the guest are:

a) Guest understands SLD and wants it on. The guest gets the same behavior as in bare metal.

b) Guest does not understand. Guest gets killed if it screws up as described below.

- If not set. It may be a old guest or a malicious guest or a guest without SLD support, and we cannot figure it out. So we have to kill the guest when host is SLD-fatal, and let guest survive when SLD-WARN for old sane buggy guest.

All true, but the result of running a Linux guest in SLD-warn mode will be broken.

In a word, all the above is on the condition that KVM advertise split-lock detection to guest. But this patch doesn't do this. Maybe I should add that part in v2.

I think you should think the details all the way through, and I think youâre likely to determine that the Intel architecture team needs to do *something* to clean up this mess.

There are two independent problems here. First, SLD *canât* be virtualized sanely because itâs per-core not per-thread.

Sadly, it's the fact we cannot change. So it's better virtualized only when SMT is disabled to make thing simple.

Second, most users *wonât want* to virtualize it correctly even if they could: if a guest is allowed to do split locks, it can DoS the system.

To avoid DoS attack, it must use sld_fatal mode. In this case, guest are forbidden to do split locks.

So I think there should be an architectural way to tell a guest that SLD is on whether it likes it or not. And the guest, if booted with sld=warn, can print a message saying âhaha, actually SLD is fatalâ and carry on.

OK. Let me sort it out.

If SMT is disabled/unsupported, so KVM advertises SLD feature to guest. below are all the case:

-----------------------------------------------------------------------
Host Guest Guest behavior
-----------------------------------------------------------------------
1. off same as in bare metal
-----------------------------------------------------------------------
2. warn off allow guest do split lock (for old guest):
hardware bit set initially, once split lock
happens, clear hardware bit when vcpu is running
So, it's the same as in bare metal

3. warn 1. user space: get #AC, then clear MSR bit, but
hardware bit is not cleared, #AC again, finally
clear hardware bit when vcpu is running.
So it's somehow the same as in bare-metal

2. kernel: same as in bare metal.

4. fatal same as in bare metal
----------------------------------------------------------------------
5.fatal off guest is killed when split lock,
or forward #AC to guest, this way guest gets an
unexpected #AC

6. warn 1. user space: get #AC, then clear MSR bit, but
hardware bit is not cleared, #AC again,
finally guest is killed, or KVM forwards #AC
to guest then guest gets an unexpected #AC.
2. kernel: same as in bare metal, call die();

7. fatal same as in bare metal
----------------------------------------------------------------------

Based on the table above, if we want guest has same behavior as in bare metal, we can set host to sld_warn mode.
If we want prevent DoS from guest, we should set host to sld_fatal mode.

Now, let's analysis what if there is an architectural way to tell a guest that SLD is forced on. Assume it's a SLD_forced_on cpuid bit.

- Host is sld_off, SLD_forced_on cpuid bit is not set, no change for
case #1

- Host is sld_fatal, SLD_forced_on cpuid bit must be set:
- if guest is SLD-aware, guest is supposed to only use fatal
mode that goes to case #7. And guest is not recommended
using warn mode. if guest persists, it goes to case #6

- if guest is not SLD-aware, maybe it's an old guest or it's a
malicious guest that pretends not SLD-aware, it goes to
case #5.

- Host is sld_warn, we have two choice
- set SLD_forced_on cpuid bit, it's the same as host is fatal.
- not set SLD_force_on_cpuid bit, it's the same as case #2,#3,#4

So I think introducing an architectural way to tell a guest that SLD is forced on can make the only difference is that, there is a way to tell guest not to use warn mode, thus eliminating case #6.

If you think it really matters, I can forward this requirement to our Intel architecture people.

ISTM, on an SLD-fatal host with an SLD-aware guest, the host should tell the guest âhey, you may not do split locks â SLD is forced onâ and the guest should somehow acknowledge it so that it sees the architectural behavior instead of something we made up. Hence my suggestion.

Next message: Randy Dunlap: "Re: Latest Git kernel: avahi-daemon[2410]: ioctl(): Inappropriate ioctl for device"
Previous message: Brian Geffon: "Re: [PATCH v2] mm: Add MREMAP_DONTUNMAP to mremap()."
In reply to: Andy Lutomirski: "Re: [PATCH 2/2] KVM: VMX: Extend VMX's #AC handding"
Next in thread: Andy Lutomirski: "Re: [PATCH 2/2] KVM: VMX: Extend VMX's #AC handding"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]