Re: Unexpected interrupt received in Guest OS when booting after "system_reset"
From: Marc Zyngier
Date: Fri Mar 29 2019 - 06:54:54 EST
On 29/03/2019 09:19, Heyi Guo wrote:
> Hi Marc,
>
> The patch works. I tested for 1.5 hour and 52 VM resets. There were
> 16 times that a virtual LPI left in the ap_list (seen by an
> additional printk) during reset and we never saw "Unexpected
> interrupt received" any more.
Thanks for testing, much appreciated.
> Just a minor comment: how about replacing /vcpu->arch.vgic_cpu./ with
> /vgic_cpu->/ in the lock/unlock code line, to reduce some words?
Well, as I said, the patch is wrong in other ways, so I wouldn't bother
with that. It only serves as a test for my theory.
I think I'm slowly warming up to you initial proposal to hook things
into the PROPBASER/PENDBASER registers, as the LPIs do have a life
outside of the ITS itself.
I'll try to respin something next week.
Thanks,
M.
>
> Thanks,
>
> Heyi
>
> On 2019/3/29 9:19, Heyi Guo wrote:
>>
>>
>> On 2019/3/29 1:18, Marc Zyngier wrote:
>>> [Please do not send HTML emails]
>> Sorry; will keep in mind next time :)
>>>
>>> On 28/03/2019 15:44, Heyi Guo wrote:
>>>> Hi Marc and Christoffer,
>>>>
>>>> When we issue "system_reset" from qemu monitor to a running VM, guest
>>>> Linux will occasionally get "Unexpected interrupt" after rebooting, with
>>>> kernel message at the bottom.
>>>>
>>>> After some investigation, we found it might be caused by the
>>>> preservation of virtual LPI during system reset: it seems the virtual
>>>> LPI remains in the ap_list during VM reset, as well as its "enabled" and
>>>> "pending_latch" status, and this causes the virtual LPI to be injected
>>>> wrongly after VCPU reboots and enables interrupt.
>>>>
>>>> We propose to clear "enabled" flag of virtual LPI when PROPBASER (or
>>>> GICR_CTRL) of virtual GICR is written to 0, and update virtual LPI
>>>> properties when GICR_CTRL.enableLPIs is set to 1 again.
>>>>
>>>> Any advice? Or did we miss something?
>>> We're clearly missing a trick here, but I'm not convinced of your
>>> approach.
>> To be honest, we were not fully convinced by ourselves either. I was worrying about guest switching GICR_CTRL or GICR_PROPBASER at runtime which probably causes issue for our rough approach.
>>
>>> What should happend is that the redistributors should be reset
>>> as well, and that this should recall any LPI that has been made pending.
>>> Unfortunately, we don't seem to have such code in place, which is
>>> embarrassing.
>>>
>>> Can you give the following, untested patch a go? It isn't right either,
>>> but it should have the right effect. If you confirm that it solves your
>>> problem, we can look at adding the right hooks...
>> Thanks, I'll test this and get back to you.
>> Heyi
>>
>>> Thanks,
>>>
>>> M.
>>>
>>> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
>>> index ab3f47745d9c..bd9a9250f323 100644
>>> --- a/virt/kvm/arm/vgic/vgic-its.c
>>> +++ b/virt/kvm/arm/vgic/vgic-its.c
>>> @@ -2403,8 +2403,32 @@ static int vgic_its_commit_v0(struct vgic_its *its)
>>> return 0;
>>> }
>>> +static void vgic_nuke_pending_lpis(struct kvm_vcpu *vcpu)
>>> +{
>>> + struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>>> + struct vgic_irq *irq, *tmp;
>>> + unsigned long flags;
>>> +
>>> + raw_spin_lock_irqsave(&vcpu->arch.vgic_cpu.ap_list_lock, flags);
>>> +
>>> + list_for_each_entry_safe(irq, tmp, &vgic_cpu->ap_list_head, ap_list) {
>>> + if (irq->intid >= VGIC_MIN_LPI) {
>>> + list_del(&irq->ap_list);
>>> + vgic_put_irq(vcpu->kvm, irq);
>>> + }
>>> + }
>>> +
>>> + raw_spin_unlock_irqrestore(&vcpu->arch.vgic_cpu.ap_list_lock, flags);
>>> +}
>>> +
>>> static void vgic_its_reset(struct kvm *kvm, struct vgic_its *its)
>>> {
>>> + struct kvm_vcpu *vcpu;
>>> + int c;
>>> +
>>> + kvm_for_each_vcpu(c, vcpu, kvm)
>>> + vgic_nuke_pending_lpis(vcpu);
>>> +
>>> /* We need to keep the ABI specific field values */
>>> its->baser_coll_table &= ~GITS_BASER_VALID;
>>> its->baser_device_table &= ~GITS_BASER_VALID;
>>>
>>
>>
>>
>> .
>>
>
>
--
Jazz is not dead. It just smells funny...