Re: [RFC PATCH] x86/split_lock: Disable SLD if an unaware (out-of-tree) module enables VMX

From: Sean Christopherson
Date: Fri Apr 03 2020 - 13:20:21 EST


On Fri, Apr 03, 2020 at 06:42:44PM +0200, Peter Zijlstra wrote:
> On Fri, Apr 03, 2020 at 09:30:07AM -0700, Sean Christopherson wrote:
> > Hook into native CR4 writes to disable split-lock detection if CR4.VMXE
> > is toggled on by an SDL-unaware entity, e.g. an out-of-tree hypervisor
> > module. Most/all VMX-based hypervisors blindly reflect #AC exceptions
> > into the guest, or don't intercept #AC in the first place. With SLD
> > enabled, this results in unexpected #AC faults in the guest, leading to
> > crashes in the guest and other undesirable behavior.
> >
> > Reported-by: "Kenneth R. Crudup" <kenny@xxxxxxxxx>
> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > Cc: Jessica Yu <jeyu@xxxxxxxxxx>
> > Cc: Rasmus Villemoes <rasmus.villemoes@xxxxxxxxx>
> > Cc: Kenneth R. Crudup <kenny@xxxxxxxxx>
> > Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> > Cc: Fenghua Yu <fenghua.yu@xxxxxxxxx>
> > Cc: Xiaoyao Li <xiaoyao.li@xxxxxxxxx>
> > Cc: Nadav Amit <namit@xxxxxxxxxx>
> > Cc: Thomas Hellstrom <thellstrom@xxxxxxxxxx>
> > Cc: Tony Luck <tony.luck@xxxxxxxxx>
> > Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
> > Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> > Cc: Jann Horn <jannh@xxxxxxxxxx>
> > Cc: Kees Cook <keescook@xxxxxxxxxxxx>
> > Cc: David Laight <David.Laight@xxxxxxxxxx>
> > Cc: Doug Covelli <dcovelli@xxxxxxxxxx>
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> > ---
> >
> > A bit ugly, but on the plus side the code is largely contained to intel.c.
> > I think forgoing the on_all_cpus() remote kill is safe?
>
> How would it be safe? You can't control where the module text will be
> ran, or how quickly.

Ugh, I forgot about the stupid core scope behavior.

CR4.VMXE needs to be set on every logical CPU before that CPU can do VMXON
and enter a guest, so every CPU will come through this code and locally
disable SLD.

But, a SMT sibling could race on the WRMSR and re-enable SLD on the CPU
that just killed SLD. Waiting until other CPUs stop enabling SLD should
work. Something like this? Disclaimer, memory ordering isn't my forte.

static atomic_t enabling_sld = ATOMIC_INIT(0);

static void sld_update_msr(bool on)
{
u64 test_ctrl_val = msr_test_ctrl_cache;

if (on && !sld_killed)
test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;

if (test_ctrl_val & MSR_TEST_CTRL_SPLIT_LOCK_DETECT)
atomic_inc(&enabling_sld);

wrmsrl(MSR_TEST_CTRL, test_ctrl_val);

if (test_ctrl_val & MSR_TEST_CTRL_SPLIT_LOCK_DETECT)
atomic_dec(&enabling_sld);
}

void split_lock_cr4_write(unsigned long val)
{
u64 ctrl;

/*
* Out-of-tree hypervisors that aren't aware of split-lock will blindly
* reflect split-lock #AC into their guests. Kill split-lock detection
* if an unaware entity enables VMX.
*/
if (!static_cpu_has(X86_FEATURE_VMX) ||
!static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) ||
!(val & X86_CR4_VMXE) || atomic_read(&cr4_vmxe_split_lock_safe) ||
(native_read_cr4() & X86_CR4_VMXE))
return;

WARN_ON_ONCE(1);

/*
* Set the global kill flag to prevent re-enabling SLD, e.g. via
* switch_to_sld().
*/
WRITE_ONCE(sld_killed, true);

/*
* No need to forcefully disable SLD on other CPUs, they'll come here
* if/when they set CR4.VMXE. But, wait until no other threads are
* enabling SLD, i.e. have seen sld_killed, as the MSR may be shared
* by SMT siblings.
*/
while (atomic_read(&enabling_sld));
sld_update_msr(false);
}