Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race

From: Jan Kiszka
Date: Fri Jul 04 2014 - 01:43:35 EST


On 2014-07-04 04:52, Wanpeng Li wrote:
> On Thu, Jul 03, 2014 at 01:27:05PM -0400, Bandan Das wrote:
> [...]
>> # modprobe kvm_intel ept=0 nested=1 enable_shadow_vmcs=0
>>
>> The Host CPU - Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
>> qemu cmd to run L1 -
>> # qemu-system-x86_64 -drive file=level1.img,if=virtio,id=disk0,format=raw,cache=none,werror=stop,rerror=stop,aio=threads -drive file=level2.img,if=virtio,id=disk1,format=raw,cache=none,werror=stop,rerror=stop,aio=threads -vnc :2 --enable-kvm -monitor stdio -m 4G -net nic,macaddr=00:23:32:45:89:10 -net tap,ifname=tap0,script=/etc/qemu-ifup,downscript=no -smp 4 -cpu Nehalem,+vmx -serial pty
>>
>> qemu cmd to run L2 -
>> # sudo qemu-system-x86_64 -hda VM/level2.img -vnc :0 --enable-kvm -monitor stdio -m 2G -smp 2 -cpu Nehalem -redir tcp:5555::22
>>
>> Additionally,
>> L0 is FC19 with 3.16-rc3
>> L1 and L2 are Ubuntu 14.04 with 3.13.0-24-generic
>>
>> Then start a kernel compilation inside L2 with "make -j3"
>>
>> There's no call trace on L0, both L0 and L1 are hung (or rather really slow) and
>> L1 serial spews out CPU softlock up errors. Enabling panic on softlockup on L1 will give
>> a trace with smp_call_function_many() I think the corresponding code in kernel/smp.c that
>> triggers this is
>>
>> WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
>> && !oops_in_progress && !early_boot_irqs_disabled);
>>
>> I know in most cases this is usually harmless, but in this specific case,
>> it seems it's stuck here forever.
>>
>> Sorry, I don't have a L1 call trace handy atm, I can post that if you are interested.
>>
>> Note that this can take as much as 30 to 40 minutes to appear but once it does,
>> you will know because both L1 and L2 will be stuck with the serial messages as I mentioned
>> before. From my side, let me try this on another system to rule out any machine specific
>> weirdness going on..
>>
>
> Thanks for your pointing out.
>
>> Please let me know if you need any further information.
>>
>
> I just run kvm-unit-tests w/ vmx.flat and eventinj.flat.
>
>
> w/ vmx.flat and w/o my patch applied
>
> [...]
>
> Test suite : interrupt
> FAIL: direct interrupt while running guest
> PASS: intercepted interrupt while running guest
> FAIL: direct interrupt + hlt
> FAIL: intercepted interrupt + hlt
> FAIL: direct interrupt + activity state hlt
> FAIL: intercepted interrupt + activity state hlt
> PASS: running a guest with interrupt acknowledgement set
> SUMMARY: 69 tests, 6 failures
>
> w/ vmx.flat and w/ my patch applied
>
> [...]
>
> Test suite : interrupt
> PASS: direct interrupt while running guest
> PASS: intercepted interrupt while running guest
> PASS: direct interrupt + hlt
> FAIL: intercepted interrupt + hlt
> PASS: direct interrupt + activity state hlt
> PASS: intercepted interrupt + activity state hlt
> PASS: running a guest with interrupt acknowledgement set
>
> SUMMARY: 69 tests, 2 failures

Which version (hash) of kvm-unit-tests are you using? All tests up to
307621765a are running fine here, but since a0e30e712d not much is
completing successfully anymore:

enabling apic
paging enabled
cr0 = 80010011
cr3 = 7fff000
cr4 = 20
PASS: test vmxon with FEATURE_CONTROL cleared
PASS: test vmxon without FEATURE_CONTROL lock
PASS: test enable VMX in FEATURE_CONTROL
PASS: test FEATURE_CONTROL lock bit
PASS: test vmxon
FAIL: test vmptrld
PASS: test vmclear
init_vmcs : make_vmcs_current error
FAIL: test vmptrst
init_vmcs : make_vmcs_current error
vmx_run : vmlaunch failed.
FAIL: test vmlaunch
FAIL: test vmlaunch

SUMMARY: 10 tests, 4 unexpected failures


Jan

--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/