Re: [PATCH] generic: Add the exception case checking routine for ppi interrupt

From: Marc Zyngier
Date: Wed Aug 31 2016 - 04:37:46 EST


On 31/08/16 07:35, majun (F) wrote:
> Hi Marc & Mark:
>
> å 2016/8/30 19:21, Mark Rutland åé:
>> On Tue, Aug 30, 2016 at 12:07:36PM +0100, Marc Zyngier wrote:
>>> +Mark
>>> On 30/08/16 11:35, majun (F) wrote:
>>>> å 2016/8/30 16:50, Marc Zyngier åé:
>>>>> On 30/08/16 05:17, MaJun wrote:
>>>>>> From: Ma Jun <majun258@xxxxxxxxxx>
>>>>>>
>>>>>> During system booting, if the interrupt which has no action registered
>>>>>> is triggered, it would cause system panic when try to access the
>>>>>> action member.
>>>>>
>>>>> And why would that interrupt be enabled? If you enable a PPI before
>>>>> registering a handler, you're doing something wrong.
>>>>
>>>> Actually,the problem described above happened during the capture
>>>> kernel booting.
>>>>
>>>> In my system, sometimes there is a pending physical timer
>>>> interrupt(30) when the first kernel panic and the status is kept
>>>> until the capture kernel booting.
>>>
>>> And that's perfectly fine. The interrupt can be pending forever, as it
>>> shouldn't get enabled.
>>>
>>>> So, this interrupt will be handled during capture kernel booting.
>>>
>>> Why? Who enables it?
>>>
>>>> Becasue we use virt timer interrupt but not physical timer interrupt
>>>> in capture kernel, the interrupt 30 has no action handler.
>>>
>>> Again: who enables this interrupt? Whichever driver enables it should be
>>> fixed.
>>
>> I'm also at a loss.
>>
>> In this case, arch_timer_uses_ppi must be VIRT_PPI. So in
>> arch_timer_register(), we'll only request_percpu_irq the virt PPI.
>> arch_timer_has_nonsecure_ppi() will be false, given arch_timer_uses_ppi
>> is VIRT_PPI, so in arch_timer_starting_cpu() we'll only
>> enable_percpu_irq() the virt PPI.
>>
>> We don't fiddle with arch_timer_uses_ppi after calling
>> arch_timer_register(). So I can't see how we could enable another IRQ in
>> this case.
>>
>> Looking at the driver in virt/kvm/arm/arch_timer.c, we only enable what
>> we've succesfully requested, so it doesnt' seem like there's an issue
>> there.
>>
>> >From a quick look at teh GIC driver, it looks like we reset PPIs
>> correctly, so it doesn't look like we have a "latent enable".
>>
>
> I just checked the status of irq 30 during capture kernel booting.
>
> The irq 30 status is: mask, pending after arch_timer_starting_cpu() called.
> Because irq 30 triggered only 1 time during capture kernel booting,
> I think this problem maybe happened in the case like:
> 1:irq 30 triggered, but not acked by cpu yet.
> 2:local_irq_disable() called
> 3:system reboot -->capture kernel booting
> 4:local_irq_enable()
> 5:irq 30 acked by CPU.
>
> Is this case possible?

I can't see how, because you've missed:

3b: All PPIs are disabled as each CPU comes up

So for (5) to occur, I can only see two possibilities:
(a) either something else is enabling the timer PPI
(b) your GIC doesn't correctly retire a pending PPI that is being disabled

I'm discounting (b) because I can't see how the system would work
otherwise, so (a) must be happening somehow.

Thanks,

M.
--
Jazz is not dead. It just smells funny...