Re: kvm-unit-test fail for split irqchip

From: Wanpeng Li
Date: Tue Sep 13 2016 - 23:52:55 EST


2016-09-14 3:01 GMT+08:00 Radim Krcmar <rkrcmar@xxxxxxxxxx>:
> 2016-09-13 19:06+0800, Wanpeng Li:
>> Add -kernel_irqchip=split
>> ./x86-run x86/eventinj.flat
>>
>> qemu-system-x86_64 -enable-kvm -machine kernel_irqchip=split -cpu host
>> -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc
>> none -serial stdio -device pci-testdev -kernel x86/eventinj.flat
>> enabling apic
>> paging enabled
>> cr0 = 80010011
>> cr3 = 7fff000
>> cr4 = 20
>> Sending vec 33 and 62 and mask one with TPR
>> irq1 running
>> irq1 running
>> After 33/62 TPR test
>> FAIL: TPR
>> irq0 running
>> irq0 running
>>
>> Both irq1 and irq0 are executing twice.
>>
>> qemu-system-x86-22794 [001] d..2 34591.708476: kvm_entry: vcpu 0
>> qemu-system-x86-22794 [001] ...1 34591.708478: kvm_exit: reason
>> MSR_WRITE rip 0x401f33 info 0 0
>> qemu-system-x86-22794 [001] ...1 34591.708478: kvm_apic: apic_write
>> APIC_EOI = 0x0
>> qemu-system-x86-22794 [001] ...1 34591.708479: kvm_eoi: apicid 0 vector 62
>> qemu-system-x86-22794 [001] ...1 34591.708479: kvm_msr: msr_write 80b = 0x0
>> qemu-system-x86-22794 [001] d..2 34591.708480: kvm_entry: vcpu 0
>> qemu-system-x86-22794 [001] ...1 34591.708482: kvm_exit: reason
>> PENDING_INTERRUPT rip 0x401f35 info 0 0
>> qemu-system-x86-22794 [001] ...1 34591.708482: kvm_userspace_exit:
>> reason KVM_EXIT_IRQ_WINDOW_OPEN (7)
>> qemu-system-x86-22794 [001] ...1 34591.708491: kvm_inj_virq: irq 62
>> qemu-system-x86-22794 [001] d..2 34591.708492: kvm_entry: vcpu 0
>> qemu-system-x86-22794 [001] ...1 34591.708493: kvm_exit: reason
>> IO_INSTRUCTION rip 0x4016ec info 3fd0008 0
>>
>> From the trace we can see there is an interrupt window exit after the
>> first interrupt EOI(irq 62), and the same irq(62) is injected
>> duplicately after the interrupt window.
>>
>> The bug can disappear if kernel_irqchip is on or -x2apic, the virtual
>> x2apic mode will not be set due to commit (8d14695f9542 x86, apicv:
>> add virtual x2apic support), so that tpr shadow in the x2apic doesn't
>> work and wrmsr TPR register will trigger vmexit, and then kvmvapic
>> will be used to optimize flexpriority=N or AMD. The
>> report_trp_access() which is called in kvm_lapic_reg_write() will
>> trigger a userspace exit.
>>
>> TPR report access callbacks in qemu, kvm_handle_tpr_access() ->
>> apic_handle_tpr_access_report() -> vapic_report_tpr_access() ->
>> cpu_synchronize_state() will get apic register states from kvm.
>>
>> Later, kvm_arch_pre_run -> cpu_get_pic_interrupt(if there is a pic
>> interrupt) -> apic_get_interrupt, it is a pic interrupt, however it
>> gets the stale irq from apic register sync by report tpr access and
>> KVM_INTERRUPT the second duplicate interrupt.
>>
>> Paolo pointed out it is not the TPR associated bug, and we should
>> figure out why there is an interrupt window exit after the first EOI.
>
> Yeah, seems like QEMU bug. QEMU does KVM_INTERRUPT(62) ioctl after KVM
> exits with KVM_EXIT_IRQ_WINDOW_OPEN, which QEMU requested while the
> guest was printing. The printing calls
>
> serial_update_irq() -> qemu_irq_lower() -> qemu_set_irq() ->
> gsi_handler() -> qemu_set_irq() -> pic_irq_request() ->
> apic_deliver_pic_intr() -> kvm_handle_interrupt()
>
> kvm_handle_interrupt() does
>
> interrupt_request |= CPU_INTERRUPT_HARD
>
> which later calls cpu_get_pic_interrupt() in kvm_arch_pre_run(), but
> that function uses stale information from APIC and injects 62 again.
> If we synchronized the APIC, then the test would #GP, because there
> would be no injectable interrupt in LAPIC or PIC, so pic_read_irq()
> would return 15, thinking it was spurious.
>
> I think the bug starts in pic_irq_request(), which should not touch
> LAPIC. The patch below makes it work (just the second hunk is
> sufficient), but it's still far from sane code ...

Thanks for the analysis, Radim! :)

Regards,
Wanpeng Li