[PATCH 0/9] KVM: x86: Fix NULL pointer #GP due to RSM bug

From: Sean Christopherson
Date: Wed Jun 09 2021 - 14:57:50 EST


Fix a NULL pointer dereference in gfn_to_rmap() that occurs if RSM fails,
reported by syzbot.

The immediate problem is that the MMU context's role gets out of sync
because KVM clears the SMM flag in the vCPU at the start of RSM emulation,
but only resets the MMU context if RSM succeeds. The divergence in vCPU
vs. MMU role with respect to the SMM flag causes explosions if the non-SMM
memslots have gfn ranges that are not present in the SMM memslots, because
the MMU expects that the memslot for a shadow page cannot magically
disappear.

The other obvious problem is that KVM doesn't emulate triple fault on RSM
failure, e.g. it keeps running the vCPU in a frankenstate instead of
exiting to userspace. Fixing that would squash the syzbot repro, but
would not fix the underlying issue because nothing prevents userspace from
calling KVM_RUN on a vCPU that hit shutdown (yay lack of a shutdown state).
But, it's easy to fix and definitely worth doing.

Everything after the two bug fixes is cleanup.

Ben Gardon has an internal patch or two that guards against the NULL
pointer dereference in gfn_to_rmap(). I'm planning on getting that
functionality posted (needs a little massaging) so that these types of
snafus don't crash the host (this isn't the first time I've introduced a
bug that broke gfn_to_rmap(), though thankfully it's the first time such
a bug has made it upstream, knock on wood).

Amusingly, adding gfn_to_rmap() NULL memslot checks might even be a
performance improvement. Because gfn_to_rmap() doesn't check the memslot
before using it, and because the compiler can see the search_memslots()
returns NULL/0, gcc often/always generates dedicated (and hilarious) code
for NULL, e.g. this #GP was caused by an explicit load from 0:

48 8b 14 25 00 00 00 00 mov 0x0,%rdx


Sean Christopherson (9):
KVM: x86: Immediately reset the MMU context when the SMM flag is
cleared
KVM: x86: Emulate triple fault shutdown if RSM emulation fails
KVM: x86: Replace .set_hflags() with dedicated .exiting_smm() helper
KVM: x86: Invoke kvm_smm_changed() immediately after clearing SMM flag
KVM: x86: Move (most) SMM hflags modifications into kvm_smm_changed()
KVM: x86: Move "entering SMM" tracepoint into kvm_smm_changed()
KVM: x86: Rename SMM tracepoint to make it reflect reality
KVM: x86: Drop .post_leave_smm(), i.e. the manual post-RSM MMU reset
KVM: x86: Drop "pre_" from enter/leave_smm() helpers

arch/x86/include/asm/kvm-x86-ops.h | 4 +--
arch/x86/include/asm/kvm_host.h | 4 +--
arch/x86/kvm/emulate.c | 31 ++++++++++-------
arch/x86/kvm/kvm_emulate.h | 7 ++--
arch/x86/kvm/svm/svm.c | 8 ++---
arch/x86/kvm/trace.h | 2 +-
arch/x86/kvm/vmx/vmx.c | 8 ++---
arch/x86/kvm/x86.c | 53 +++++++++++++++---------------
8 files changed, 61 insertions(+), 56 deletions(-)

--
2.32.0.rc1.229.g3e70b5a671-goog