Re: [PATCH v2 7/9] x86: Use current USER_CS to setup correct context on vmx entry

From: Sean Christopherson
Date: Fri Oct 02 2020 - 18:40:11 EST


On Thu, Oct 01, 2020 at 02:52:32PM -0700, Andy Lutomirski wrote:
> On Thu, Oct 1, 2020 at 1:59 PM Gabriel Krisman Bertazi
> <krisman@xxxxxxxxxxxxx> wrote:
> >
> > vmx_prepare_switch_to_guest shouldn't use is_64bit_mm, which has a
> > very specific use in uprobes. Use the user_64bit_mode helper instead.
> > This reduces the usage of is_64bit_mm, which is awkward, since it relies
> > on the personality at load time, which is fine for uprobes, but doesn't
> > seem fine here.
> >
> > I tested this by running VMs with 64 and 32 bits payloads from 64/32
> > programs.
> >
> > Signed-off-by: Gabriel Krisman Bertazi <krisman@xxxxxxxxxxxxx>
> > ---
> > arch/x86/kvm/vmx/vmx.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 7b2a068f08c1..b5aafd9e5f5d 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -1172,7 +1172,7 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
> > savesegment(es, host_state->es_sel);
> >
> > gs_base = cpu_kernelmode_gs_base(cpu);
> > - if (likely(is_64bit_mm(current->mm))) {
> > + if (likely(user_64bit_mode(current_pt_regs()))) {
> > current_save_fsgs();
> > fs_sel = current->thread.fsindex;
> > gs_sel = current->thread.gsindex;
>
> I disagree with this one. This whole code path is nonsense. Can you
> just remove the condition entirely and use the 64-bit path
> unconditionally?

I finally came back to this one with fresh eyes. I've read through the code
a bajllion times and typed up half a dozen responses. I think, finally, I
understand what's broken.

I believe your original assertion that the bug was misdiagnosed is correct
(can't link because LKML wasn't in the loop). I'm pretty sure your analysis
that KVM's handling of things works mostly by coincidence is also correct.

The coincidence is that "real" VMMs all use arch_prctl(), and
do_arch_prctl_64() ensures thread.{fs,gs}base are accurate. save_base_legacy()
detects sel==0 and intentionally does nothing, knowing the the base is already
accurate.

Userspaces that don't use arch_prctl(), in the bug report case a 32-bit compat
test, may or may not have accurate thread.{fs,gs}base values. This is
especially true if sel!=0 as save_base_legacy() explicitly zeros the base in
this case, as load_seg_legacy() will restore the seg on the backend.

KVM on the other hand assumes thread.{fs,gs}base are always fresh. When that
didn't hold true for userspace that didn't use arch_prctl(), the fix of
detecting a !64-bit mm just so happened to work because all 64-bit VMMs use
arch_prctl().

It's tempting to just open code this and use RD{FS,GS}BASE when possible,
i.e. avoid any guesswork. Maybe with a module param that userspace can set
to tell KVM it doesn't do anything fancy with FS/GS base (will userspace still
use arch_prctl() even if FSGSABSE is available?).

savesegment(fs, fs_sel);
savesegment(gs, gs_sel);
if (use_current_fsgs_base) {
fs_base = current->thread.fsbase;
vmx->msr_host_kernel_gs_base = current->thread.gsbase;
} else if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
fs_base = rdfsbase()
vmx->msr_host_kernel_gs_base = __rdgsbase_inactive();
} else {
fs_base = read_msr(MSR_FS_BASE);
vmx->msr_host_kernel_gs_base = read_msr(MSR_KERNEL_GS_BASE);
}

I'll revisit this on Monday and run a patch by Paolo.