Re: general protection fault in vmx_vcpu_run

From: Raslan, KarimAllah
Date: Wed Jul 04 2018 - 15:32:05 EST

Next message: Vadim Pasternak: "RE: [PATCH] platform/mellanox: Use 2-factor allocator calls"
Previous message: Eric Biggers: "Re: general protection fault in kernel_sock_shutdown"
Next in thread: Dmitry Vyukov: "Re: general protection fault in vmx_vcpu_run"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Dmitry,

Can you share the host kernel version?

I can not reproduce any of these crash signatures and I think it'sÂ
really a nested virtualization bug. So I will need the exact hostÂ
kernel version as well.

I am currently getting all sorts of:

"KVM: entry failed, hardware error 0x7"

... instead of the crash signatures that you are posting.

Regards.

On Sat, 2018-06-30 at 08:09 +0000, Raslan, KarimAllah wrote:
> Looking also at the other crash [0]:
>
> Â Â Â Â msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
> ffffffff811f65b7:ÂÂÂÂÂÂÂe8 44 cb 57 00ÂÂÂÂÂÂÂÂÂÂcallqÂÂffffffff81773100
> <__sanitizer_cov_trace_pc>
> ffffffff811f65bc:ÂÂÂÂÂÂÂ48 8b 54 24 08ÂÂÂÂÂÂÂÂÂÂmovÂÂÂÂ0x8(%rsp),%rdx
> ffffffff811f65c1:ÂÂÂÂÂÂÂ48 b8 00 00 00 00 00ÂÂÂÂmovabs
> $0xdffffc0000000000,%rax
> ffffffff811f65c8:ÂÂÂÂÂÂÂfc ff df
> ffffffff811f65cb:ÂÂÂÂÂÂÂ48 c1 ea 03ÂÂÂÂÂÂÂÂÂÂÂÂÂshrÂÂÂÂ$0x3,%rdx
> ffffffff811f65cf:ÂÂÂÂÂÂÂ80 3c 02
> 00ÂÂÂÂÂÂÂÂÂÂÂÂÂcmpbÂÂÂ$0x0,(%rdx,%rax,1) Â Â Â Â<- fault here.
> ffffffff811f65d3:ÂÂÂÂÂÂÂ0f 85 36 19 00 00ÂÂÂÂÂÂÂjneÂÂÂÂffffffff811f7f0f
> <vmx_vcpu_run+0x236f>
>
> %rdx should contain a pointer to loaded_vmcs. It is directly loadedÂ
> from the stack [0x8(%rsp)]. This same stack location was just usedÂ
> before the inlined assembly for VMRESUME/VMLAUNCH here:
>
> Â Â Â Â vmx->__launched = vmx->loaded_vmcs->launched;
> ffffffff811f639f:ÂÂÂÂÂÂÂe8 5c cd 57 00ÂÂÂÂÂÂÂÂÂÂcallqÂÂffffffff81773100
> <__sanitizer_cov_trace_pc>
> ffffffff811f63a4:ÂÂÂÂÂÂÂ48 8b 54 24 08ÂÂÂÂÂÂÂÂÂÂmovÂÂÂÂ0x8(%rsp),%rdx
> ffffffff811f63a9:ÂÂÂÂÂÂÂ48 b8 00 00 00 00 00ÂÂÂÂmovabs
> $0xdffffc0000000000,%rax
> ffffffff811f63b0:ÂÂÂÂÂÂÂfc ff df
> ffffffff811f63b3:ÂÂÂÂÂÂÂ48 c1 ea 03ÂÂÂÂÂÂÂÂÂÂÂÂÂshrÂÂÂÂ$0x3,%rdx
> ffffffff811f63b7:ÂÂÂÂÂÂÂ80 3c 02
> 00ÂÂÂÂÂÂÂÂÂÂÂÂÂcmpbÂÂÂ$0x0,(%rdx,%rax,1) Â Â Â Â<- used here.
>
> ... and this stack location was never touched by anything in between!Â
> So something must have corrupted the stack itself not really theÂ
> kvm_vc
> pu struct.
>
> Obviously the inlined assembly block is using the stack as well, but IÂ
> can not see anything that would cause this corruption there.
>
> That being said, looking at the %rsp and %rbp values that are dumped
> in the stack trace:
>
> RSP: ffff8801b7d7f380
> RBP: ffff8801b8260140
>
> ... they are almost 4.8 MiB apart! Should not these two register be aÂ
> bit closer to each other? :)
>
> So 2 possibilities here:
>
> 1- %rsp is wrong
>
> That would explain why the loaded_vmcs was NULL. However, it is a bitÂ
> harder to understand how it became wrong! It should have been restoredÂ
> during the VMEXIT from the HOST_RSP value in the VMCS!
>
> Is this a nested setup?
>
> 2- %rbp is wrong
>
> That would also explain why the loaded_vmcs was NULL. Whatever
> corrupted the stack that caused loaded_vmcs to be NULL could have also
> corrupted the %rbp saved in the stack. That would mean that it happened
> during a function call. All function calls that happened between the
> point when the stack was sane (just before the "asm" block for
> VMLAUNCH) and the crash-site are only kcov related. Looking at kcov, I
> can not see where the stack would get corrupted though! Obviously
> another source of corruption can be a completely unrelated thread
> directly corruption this thread'sÂmemory.
>
> Maybe it would be easier to just try to repro it first and see whichÂ
> one is true (if at all).
>
> [0]Âhttps://syzkaller.appspot.com/bug?extid=cc483201a3c6436d3550
>
>
> On Thu, 2018-06-28 at 10:18 -0700, Jim Mattson wrote:
> >
> > 22: 0f 01 c3 vmresume
> > 25: 48 89 4c 24 08 mov %rcx,0x8(%rsp)
> > 2a: 59 pop %rcx
> >
> > <rip>:
> > 2b: 0f 96 81 88 56 00 00 setbe 0x5688(%rcx)
> > 32: 48 89 81 00 03 00 00 mov %rax,0x300(%rcx)
> > 39: 48 89 99 18 03 00 00 mov %rbx,0x318(%rcx)
> >
> > %rcx should be pointing to the vcpu_vmx structure, but it's not even
> > canonical: 1ffff10035842e78.
> >
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

Next message: Vadim Pasternak: "RE: [PATCH] platform/mellanox: Use 2-factor allocator calls"
Previous message: Eric Biggers: "Re: general protection fault in kernel_sock_shutdown"
Next in thread: Dmitry Vyukov: "Re: general protection fault in vmx_vcpu_run"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]