Re: general protection fault in vmx_vcpu_run

From: Dmitry Vyukov
Date: Thu Jul 05 2018 - 01:33:22 EST


On Wed, Jul 4, 2018 at 9:31 PM, Raslan, KarimAllah <karahmed@xxxxxxxxx> wrote:
> Dmitry,
>
> Can you share the host kernel version?
>
> I can not reproduce any of these crash signatures and I think it's
> really a nested virtualization bug. So I will need the exact host
> kernel version as well.
>
> I am currently getting all sorts of:
>
> "KVM: entry failed, hardware error 0x7"
>
> ... instead of the crash signatures that you are posting.


Hi Raslan,

The tested kernel runs as GCE VM.
Jim, how can we describe the host kernel for GCE? Potentially only we
can debug this.


> On Sat, 2018-06-30 at 08:09 +0000, Raslan, KarimAllah wrote:
>> Looking also at the other crash [0]:
>>
>> msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
>> ffffffff811f65b7: e8 44 cb 57 00 callq ffffffff81773100
>> <__sanitizer_cov_trace_pc>
>> ffffffff811f65bc: 48 8b 54 24 08 mov 0x8(%rsp),%rdx
>> ffffffff811f65c1: 48 b8 00 00 00 00 00 movabs
>> $0xdffffc0000000000,%rax
>> ffffffff811f65c8: fc ff df
>> ffffffff811f65cb: 48 c1 ea 03 shr $0x3,%rdx
>> ffffffff811f65cf: 80 3c 02
>> 00 cmpb $0x0,(%rdx,%rax,1) <- fault here.
>> ffffffff811f65d3: 0f 85 36 19 00 00 jne ffffffff811f7f0f
>> <vmx_vcpu_run+0x236f>
>>
>> %rdx should contain a pointer to loaded_vmcs. It is directly loaded
>> from the stack [0x8(%rsp)]. This same stack location was just used
>> before the inlined assembly for VMRESUME/VMLAUNCH here:
>>
>> vmx->__launched = vmx->loaded_vmcs->launched;
>> ffffffff811f639f: e8 5c cd 57 00 callq ffffffff81773100
>> <__sanitizer_cov_trace_pc>
>> ffffffff811f63a4: 48 8b 54 24 08 mov 0x8(%rsp),%rdx
>> ffffffff811f63a9: 48 b8 00 00 00 00 00 movabs
>> $0xdffffc0000000000,%rax
>> ffffffff811f63b0: fc ff df
>> ffffffff811f63b3: 48 c1 ea 03 shr $0x3,%rdx
>> ffffffff811f63b7: 80 3c 02
>> 00 cmpb $0x0,(%rdx,%rax,1) <- used here.
>>
>> ... and this stack location was never touched by anything in between!
>> So something must have corrupted the stack itself not really the
>> kvm_vc
>> pu struct.
>>
>> Obviously the inlined assembly block is using the stack as well, but I
>> can not see anything that would cause this corruption there.
>>
>> That being said, looking at the %rsp and %rbp values that are dumped
>> in the stack trace:
>>
>> RSP: ffff8801b7d7f380
>> RBP: ffff8801b8260140
>>
>> ... they are almost 4.8 MiB apart! Should not these two register be a
>> bit closer to each other? :)
>>
>> So 2 possibilities here:
>>
>> 1- %rsp is wrong
>>
>> That would explain why the loaded_vmcs was NULL. However, it is a bit
>> harder to understand how it became wrong! It should have been restored
>> during the VMEXIT from the HOST_RSP value in the VMCS!
>>
>> Is this a nested setup?
>>
>> 2- %rbp is wrong
>>
>> That would also explain why the loaded_vmcs was NULL. Whatever
>> corrupted the stack that caused loaded_vmcs to be NULL could have also
>> corrupted the %rbp saved in the stack. That would mean that it happened
>> during a function call. All function calls that happened between the
>> point when the stack was sane (just before the "asm" block for
>> VMLAUNCH) and the crash-site are only kcov related. Looking at kcov, I
>> can not see where the stack would get corrupted though! Obviously
>> another source of corruption can be a completely unrelated thread
>> directly corruption this thread's memory.
>>
>> Maybe it would be easier to just try to repro it first and see which
>> one is true (if at all).
>>
>> [0] https://syzkaller.appspot.com/bug?extid=cc483201a3c6436d3550
>>
>>
>> On Thu, 2018-06-28 at 10:18 -0700, Jim Mattson wrote:
>> >
>> > 22: 0f 01 c3 vmresume
>> > 25: 48 89 4c 24 08 mov %rcx,0x8(%rsp)
>> > 2a: 59 pop %rcx
>> >
>> > <rip>:
>> > 2b: 0f 96 81 88 56 00 00 setbe 0x5688(%rcx)
>> > 32: 48 89 81 00 03 00 00 mov %rax,0x300(%rcx)
>> > 39: 48 89 99 18 03 00 00 mov %rbx,0x318(%rcx)
>> >
>> > %rcx should be pointing to the vcpu_vmx structure, but it's not even
>> > canonical: 1ffff10035842e78.
>> >
> Amazon Development Center Germany GmbH
> Berlin - Dresden - Aachen
> main office: Krausenstr. 38, 10117 Berlin
> Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
> Ust-ID: DE289237879
> Eingetragen am Amtsgericht Charlottenburg HRB 149173 B