Re: [RFC] KVM: SVM: do not drop VMCB CPL to 0 if SS is not present

From: Roman Penyaev
Date: Wed May 24 2017 - 15:19:29 EST


On Sun, May 21, 2017 at 10:19 PM, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>>
>>>
>>> Unless... is this the sysret_ss_attrs issue?
>>
>>
>> What is the issue? This one
>>
>> https://lkml.org/lkml/2015/4/24/770
>
>
> Yes.
>
> But I was thinking about it wrong, since this is probably 64-bit userspace,

sorry, I forgot to mention that userspace is indeed 64-bit.

> not 32-bit userspace. Here's my theory:
>
> 1. User task A does a syscall. It's not in kernel mode with SS != 0.
>
> 2. The scheduler runs and switches to task B. SS != 0.
>
> 2. Kernel enters user mode for task B.
>
> 3. User task B gets interrupted. Kernel ends up running with SS = 0.
>
> 4. Kernel switches back to task A. SS == 0.
>
> 5. Kernel does SYSRET. SS == __USER_DS, but SS's attributes are messed up.
>
> 6. QEMU does whatever it does that inspires it to zap SS's attributes.
>
> 7. Boom.
>
> If task B were 32-bit, then the vDSO would fix up SS, so there would only be
> a 1-instruction window for problems.
>
> To check this theory, you could try backporting this to the guest and seeing
> if the problem goes away:
>
> commit 61f01dd941ba9e06d2bf05994450ecc3d61b6b8b
> Author: Andy Lutomirski <luto@xxxxxxxxxx>
> Date: Sun Apr 26 16:47:59 2015 -0700
>
> x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue


Yes, that is exactly what is happening. I 've backported your patch on 3.16.
That explains everything. Why bug is not reproduced on >= 4.1 guest kernels
and why we fall out from VMRUN with SS.attributes == 0x400, i.e. P bit is
not set (because of "AMD CPUs have a misfeature").


>>> Looks like the bug is in QEMU, then, right?
>>
>>
>> KVM SVM restores CPL from unusable selector, obviously this is not nice.
>
>
> I would imagine that QEMU shouldn't be feeding KVM such a selector. Also,
> there's an invariant that SS.DPL == CPL, at least most of the time, although
> this SYSRET issue may be the exception.
>
> Paolo, what's the intended behavior here? Is the bug in KVM or in QEMU?

So, along with Andrew's workaround for the kernel, it seems that virtualization
side should be fixed accordingly to workaround AMD behaviour.

Guys, any ping?

--
Roman