Re: general protection fault in vmx_vcpu_run

From: Raslan, KarimAllah
Date: Sat Jun 30 2018 - 04:11:02 EST


Looking also at the other crash [0]:

    msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
ffffffff811f65b7:ÂÂÂÂÂÂÂe8 44 cb 57 00ÂÂÂÂÂÂÂÂÂÂcallqÂÂffffffff81773100
<__sanitizer_cov_trace_pc>
ffffffff811f65bc:ÂÂÂÂÂÂÂ48 8b 54 24 08ÂÂÂÂÂÂÂÂÂÂmovÂÂÂÂ0x8(%rsp),%rdx
ffffffff811f65c1:ÂÂÂÂÂÂÂ48 b8 00 00 00 00 00ÂÂÂÂmovabs
$0xdffffc0000000000,%rax
ffffffff811f65c8:ÂÂÂÂÂÂÂfc ff df
ffffffff811f65cb:ÂÂÂÂÂÂÂ48 c1 ea 03ÂÂÂÂÂÂÂÂÂÂÂÂÂshrÂÂÂÂ$0x3,%rdx
ffffffff811f65cf:ÂÂÂÂÂÂÂ80 3c 02
00ÂÂÂÂÂÂÂÂÂÂÂÂÂcmpbÂÂÂ$0x0,(%rdx,%rax,1) Â Â Â Â<- fault here.
ffffffff811f65d3:ÂÂÂÂÂÂÂ0f 85 36 19 00 00ÂÂÂÂÂÂÂjneÂÂÂÂffffffff811f7f0f
<vmx_vcpu_run+0x236f>

%rdx should contain a pointer to loaded_vmcs. It is directly loadedÂ
from the stack [0x8(%rsp)]. This same stack location was just usedÂ
before the inlined assembly for VMRESUME/VMLAUNCH here:

    vmx->__launched = vmx->loaded_vmcs->launched;
ffffffff811f639f:ÂÂÂÂÂÂÂe8 5c cd 57 00ÂÂÂÂÂÂÂÂÂÂcallqÂÂffffffff81773100
<__sanitizer_cov_trace_pc>
ffffffff811f63a4:ÂÂÂÂÂÂÂ48 8b 54 24 08ÂÂÂÂÂÂÂÂÂÂmovÂÂÂÂ0x8(%rsp),%rdx
ffffffff811f63a9:ÂÂÂÂÂÂÂ48 b8 00 00 00 00 00ÂÂÂÂmovabs
$0xdffffc0000000000,%rax
ffffffff811f63b0:ÂÂÂÂÂÂÂfc ff df
ffffffff811f63b3:ÂÂÂÂÂÂÂ48 c1 ea 03ÂÂÂÂÂÂÂÂÂÂÂÂÂshrÂÂÂÂ$0x3,%rdx
ffffffff811f63b7:ÂÂÂÂÂÂÂ80 3c 02
00ÂÂÂÂÂÂÂÂÂÂÂÂÂcmpbÂÂÂ$0x0,(%rdx,%rax,1) Â Â Â Â<- used here.

... and this stack location was never touched by anything in between!Â
So something must have corrupted the stack itself not really theÂ
kvm_vc
pu struct.

Obviously the inlined assembly block is using the stack as well, but IÂ
can not see anything that would cause this corruption there.

That being said, looking at the %rsp and %rbp values that are dumped
in the stack trace:

RSP: ffff8801b7d7f380
RBP: ffff8801b8260140

... they are almost 4.8 MiB apart! Should not these two register be aÂ
bit closer to each other? :)

So 2 possibilities here:

1- %rsp is wrong

That would explain why the loaded_vmcs was NULL. However, it is a bitÂ
harder to understand how it became wrong! It should have been restoredÂ
during the VMEXIT from the HOST_RSP value in the VMCS!

Is this a nested setup?

2- %rbp is wrong

That would also explain why the loaded_vmcs was NULL. Whatever
corrupted the stack that caused loaded_vmcs to be NULL could have also
corrupted the %rbp saved in the stack. That would mean that it happened
during a function call. All function calls that happened between the
point when the stack was sane (just before the "asm" block for
VMLAUNCH) and the crash-site are only kcov related. Looking at kcov, I
can not see where the stack would get corrupted though! Obviously
another source of corruption can be a completely unrelated thread
directly corruption this thread'sÂmemory.

Maybe it would be easier to just try to repro it first and see whichÂ
one is true (if at all).

[0]Âhttps://syzkaller.appspot.com/bug?extid=cc483201a3c6436d3550


On Thu, 2018-06-28 at 10:18 -0700, Jim Mattson wrote:
> 22: 0f 01 c3 vmresume
> 25: 48 89 4c 24 08 mov %rcx,0x8(%rsp)
> 2a: 59 pop %rcx
>
> <rip>:
> 2b: 0f 96 81 88 56 00 00 setbe 0x5688(%rcx)
> 32: 48 89 81 00 03 00 00 mov %rax,0x300(%rcx)
> 39: 48 89 99 18 03 00 00 mov %rbx,0x318(%rcx)
>
> %rcx should be pointing to the vcpu_vmx structure, but it's not even
> canonical: 1ffff10035842e78.
>
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B