On Sun, May 21, 2017 at 5:31 AM, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
On 05/19/2017 09:14 AM, Roman Penyaev wrote:
Hi folks,
After experiencing guest double faults (sometimes triple faults) on
3.16 guest kernels with the following common pattern:
[cut]
Further tracking of VMCB states before and after VMRUN showed, that
CPL becomes 0 when VMEXIT happens with the following SS segment:
ss = {
selector = 0x2b,
attrib = 0x400,
limit = 0xffffffff,
base = 0x0
},
cpl = 0x3
Then on next VMRUN VMCB looks as the following:
ss = {
selector = 0x2b,
attrib = 0x0, <<< dropped to 0
limit = 0xffffffff,
base = 0x0
},
cpl = 0x0, <<< dropped to 0
Obviously it was changed between VMRUN calls. The following backtrace
shows that VMCB.SAVE.CPL was set to 0 by QEMU itself:
CPU: 55 PID: 59531 Comm: kvm
[<ffffffffa00a3a20>] kvm_arch_vcpu_ioctl_set_sregs+0x2e0/0x480 [kvm]
[<ffffffffa008ddf0>] kvm_write_guest_cached+0x540/0xc00 [kvm]
[<ffffffff8107d695>] ? finish_task_switch+0x185/0x240
[<ffffffff8180097c>] ? __schedule+0x28c/0xa10
[<ffffffff811a9aad>] do_vfs_ioctl+0x2cd/0x4a0
SS segment which came from QEMU had the following struct members:
SS->base = 0
SS->limit = ffffffff
SS->selector = 2b
SS->type = 0
SS->present = 0
SS->dpl = 0
SS->db = 0
SS->s = 0
SS->l = 0
SS->g = 0
SS->avl = 0
SS->unusable = 1
Indeed, on last VMEXIT SS segment does not have (P) present bit set in
segment attributes:
(gdb) p 0x400 & (1 << SVM_SELECTOR_P_SHIFT)
$1 = 0
Huh? How is that even possible? It should not be possible to actually run
the vCPU with a non-NULL SS that isn't present.
That is utterly good question :) I do not know. According to my shallow
understanding (P) bit is only a hint for CPU that corresponding segment was
read from gdt and now is cached in private CPU registers (attributes).
Am I right?
At least what I see that it is quite often the case when we exit from VMRUN
with segment not present then VMRUN is resumed and on next vmexit segment has
correct attributes.
How would you cause it to happen?
We run fio and iperf tests in guests for a couple of days. Nothing more,
nothing special. Guests are bare debians with 3.16 kernels.
Unless... is this the sysret_ss_attrs issue?
What is the issue? This one
https://lkml.org/lkml/2015/4/24/770
So when on VMEXIT we have such SS state (P bit is not set) and QEMU
just decides to synchronize registers the following happens on QEMU
side:
kvm_cpu_synchronize_state():
kvm_arch_get_registers():
...
get_seg():
if (rhs->unusable) {
lhs->flags = 0; <<< SS is unusable [(P) is not set)
<<< all attributes are dropped to 0.
}
cpu->kvm_vcpu_dirty = true; <<< Mark VCPU state as dirty
Looks like the bug is in QEMU, then, right?
KVM SVM restores CPL from unusable selector, obviously this is not nice.
arch/x86/kvm/svm.c:svm_set_segment():
if (var->unusable)
s->attrib = 0;
...
if (seg == VCPU_SREG_SS)
svm->vmcb->save.cpl = (s->attrib >> SVM_SELECTOR_DPL_SHIFT) & 3;
Meanwhile QEMU resets attributes, despite the fact that DPL (which is passed
from KVM) is correct.
So it is not clear what is the proper way to fix that. What is clear is
that CPL is set to 0 because of this game with registers on both sides.
Now the question is what side to fix or probably both.
Couldn't you just fix this code
in QEMU by, say, deleting it?
Certainly, but would be nice to listen to KVM maintainers. At least the issue
is clear and what is left is a proper one-line fix :)