Re: [PATCH] KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow
From: Andrew Cooper
Date: Wed Jan 19 2022 - 22:18:20 EST
On 20/01/2022 00:06, Sean Christopherson wrote:
> Set vmcs.GUEST_PENDING_DBG_EXCEPTIONS.BS, a.k.a. the pending single-step
> breakpoint flag, when re-injecting a #DB with RFLAGS.TF=1, and STI or
> MOVSS blocking is active. Setting the flag is necessary to make VM-Entry
> consistency checks happy, as VMX has an invariant that if RFLAGS.TF is
> set and STI/MOVSS blocking is true, then the previous instruction must
> have been STI or MOV/POP, and therefore a single-step #DB must be pending
> since the RFLAGS.TF cannot have been set by the previous instruction,
> i.e. the one instruction delay after setting RFLAGS.TF must have already
> expired.
>
> Normally, the CPU sets vmcs.GUEST_PENDING_DBG_EXCEPTIONS.BS appropriately
> when recording guest state as part of a VM-Exit, but #DB VM-Exits
> intentionally do not treat the #DB as "guest state" as interception of
> the #DB effectively makes the #DB host-owned, thus KVM needs to manually
> set PENDING_DBG.BS when forwarding/re-injecting the #DB to the guest.
The problem is that none of this is documented.
Amongst other things, the vmentry consistency check misses the case when
#DB really is pending in ENTRY_INTR_INFO.
It is very clear that to use VT-x/SVM correctly, required reading
includes the core microcode and RTL, which of course all of us have
access to...
> Note, although this bug can be triggered by guest userspace, doing so
> requires IOPL=3, and guest userspace running with IOPL=3 has full access
> to all I/O ports (from the guest's perspective) and can crash/reboot the
> guest any number of ways. IOPL=3 is required because STI blocking kicks
> in if and only if RFLAGS.IF is toggled 0=>1, and if CPL>IOPL, STI either
> takes a #GP or modifies RFLAGS.VIF, not RFLAGS.IF.
>
> MOVSS blocking can be initiated by userspace, but can be coincident with
> a #DB if and only if DR7.GD=1 (General Detect enabled) and a MOV DR is
> executed in the MOVSS shadow. MOV DR #GPs at CPL>0, thus MOVSS blocking
> is problematic only for CPL0 (and only if the guest is crazy enough to
> access a DR in a MOVSS shadow). All other sources of #DBs are either
> suppressed by MOVSS blocking (single-step, code fetch, data, and I/O),
It is more complicated than this and undocumented. Single step is
discard in a shadow, while data breakpoints are deferred.
I've just run an experiment with code breakpoints, as they're faults
like General Detect:
bool do_unhandled_exception(struct cpu_regs *regs)
{
static int limit;
if ( limit++ > 10 )
return false;
if ( regs->entry_vector == X86_EXC_DB )
{
unsigned int pending_dbg = read_dr6() ^ X86_DR6_DEFAULT;
unsigned int dr7 = read_dr7(), spurious = 0;
for ( int i = 0; i < 4; ++i )
if ( pending_dbg & (1 << i) && ((dr7 >> (2 * i)) & 3) == 0 )
spurious |= (1 << i);
printk("#DB at %04x:%p, pending %08x, spurious %x\n",
regs->cs, _p(regs->ip), pending_dbg ^ spurious, spurious);
write_dr6(X86_DR6_DEFAULT);
return true;
}
return false;
}
void test_main(void)
{
extern char l0[] asm ("0f"), l1[] asm ("1f");
extern char l2[] asm ("2f"), l3[] asm ("3f");
unsigned int tmp;
write_cr4(read_cr4() | X86_CR4_DE);
write_dr0(_u(l0));
write_dr1(_u(l1));
write_dr2(_u(l2));
write_dr3(_u(l3));
write_dr7(/* DR7_SYM(0, G, X) | */
/* DR7_SYM(1, G, X) | */
DR7_SYM(2, G, X) |
/* DR7_SYM(3, G, X) | */
X86_DR7_GE);
asm volatile("mov %%ss, %[tmp];"
"pushf;"
"pushf;"
"orl $"STR(X86_EFLAGS_TF)", (%%"_ASM_SP");"
"popf;"
"nop;"
"0: nop;"
"1: mov %[tmp], %%ss;"
"2: nop;"
"3: popf;"
: [tmp] "=r" (tmp));
/* If the VM is still alive, it didn't suffer a vmentry failure. */
xtf_success("Success: Not vulnerable to XSA-308\n");
}
$ objdump -d tests/xsa-308/test-hvm64-xsa-308 | grep -A25 '<test_main>:'
001048a0 <test_main>:
1048a0: 0f 20 e0 mov %cr4,%rax
1048a3: 48 83 c8 08 or $0x8,%rax
1048a7: 0f 22 e0 mov %rax,%cr4
1048aa: b8 df 48 10 00 mov $0x1048df,%eax
1048af: 0f 23 c0 mov %rax,%db0
1048b2: b8 e0 48 10 00 mov $0x1048e0,%eax
1048b7: 0f 23 c8 mov %rax,%db1
1048ba: b8 e2 48 10 00 mov $0x1048e2,%eax
1048bf: 0f 23 d0 mov %rax,%db2
1048c2: b8 e3 48 10 00 mov $0x1048e3,%eax
1048c7: 0f 23 d8 mov %rax,%db3
1048ca: b8 20 02 00 00 mov $0x220,%eax
1048cf: 0f 23 f8 mov %rax,%db7
1048d2: 8c d0 mov %ss,%eax
1048d4: 9c pushf
1048d5: 9c pushf
1048d6: 81 0c 24 00 01 00 00 orl $0x100,(%rsp)
1048dd: 9d popf
1048de: 90 nop
1048df: 90 nop
1048e0: 8e d0 mov %eax,%ss
1048e2: 90 nop
1048e3: 9d popf
1048e4: bf 00 3e 11 00 mov $0x113e00,%edi
1048e9: 31 c0 xor %eax,%eax
gives
--- Xen Test Framework ---
Environment: HVM 64bit (Long mode 4 levels)
XSA-308 PoC
#DB at 0008:00000000001048df, pending 00004000, spurious 1
#DB at 0008:00000000001048e0, pending 00004000, spurious 2
#DB at 0008:00000000001048e3, pending 00004000, spurious 8
#DB at 0008:00000000001048e4, pending 00004000, spurious 0
Success: Not vulnerable to XSA-308
which suggests that the active code breakpoint in the MovSS shadow is
discarded too, because of no #DB on the 0x1048e2 boundary.
This test is obscured by another bug/misfeature/something where the
B{0..3} get lost on vmexit if BT is also set.
> are mutually exclusive with MOVSS blocking (T-bit task switch),
Howso? MovSS prevents external interrupts from triggering task
switches, but instruction sources still trigger in a shadow.
> or are
> already handled by KVM (ICEBP, a.k.a. INT1).
Other sources of #DB include RTM debugging, with errata causing a
fault-style #DB pointing at the XBEGIN instruction, rather than
vectoring to the abort handler, and splitlock which is new since I last
thought about this problem.
> This bug was originally found by running tests[1] created for XSA-308[2].
> Note that Xen's userspace test emits ICEBP in the MOVSS shadow, which is
> presumably why the Xen bug was deemed to be an exploitable DOS from guest
> userspace.
As I recall, the original report to the security team was something
along the lines of "Steam has just updated game, and now when I start
it, the VM explodes".
> KVM already handles ICEBP by skipping the ICEBP instruction
> and thus clears MOVSS blocking as a side effect of its "emulation".
>
> [1] http://xenbits.xenproject.org/docs/xtf/xsa-308_2main_8c_source.html
This URL is at the whim of doxygen and not necessarily stable.
https://xenbits.xen.org/gitweb/?p=xtf.git;a=blob;f=tests/xsa-308/main.c
ought to have better longevity, as well as including test description.
~Andrew