RE: [PATCH 1/1] arm64: kexec: no need to do irq_chip->irq_mask if it already masked

From: Jason Liu
Date: Thu Aug 13 2020 - 02:03:27 EST




> -----Original Message-----
> From: Marc Zyngier <maz@xxxxxxxxxx>
> Sent: Thursday, August 6, 2020 8:26 PM
> To: Jason Liu <jason.hui.liu@xxxxxxx>
> Cc: Sudeep Holla <sudeep.holla@xxxxxxx>; catalin.marinas@xxxxxxx;
> will@xxxxxxxxxx; sashal@xxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [PATCH 1/1] arm64: kexec: no need to do irq_chip->irq_mask if it
> already masked
>
> On 2020-08-06 11:05, Jason Liu wrote:
> >> -----Original Message-----
> >> From: Marc Zyngier <maz@xxxxxxxxxx>
>
> [...]
>
> >> > No, this patch is not papering over a much deeper issue in the driver.
> >> > This is just to make things better for the ARM64 kexec.
> >>
> >> Yes, I'm sure it is... However:
> >>
> >> request_irq()
> >> <goes into suspend, panic somewhere after having turned the irqchip
> >> clock off> if (chip->irq_mask && !irqd_irq_masked(&desc->irq_data))
> >> <explodes, as the interrupt isn't masked>
> >>
> >> This is because the PM in the irqsteer driver is completely busted:
> >> request_irq() should get a reference on the driver to prevent it from
> >> being suspended. Since you don't implement it correctly, this doesn't
> >> happen and your "improvement" doesn't help at all.
> >
> > The request_irq will get a reference to prevent the irqchip from being
> > suspended due to it call irq_chip_pm_get(). I am pretty sure we have
> > implemented correctly and that is also the common Linux code.
>
> Then it seems you cannot read your own driver. At no point do you set the
> parent_device that would give you a fighting chance to get the device clocked
> and powered on. How does it work? Magic?
>
> > In order to save power and let the irqchip enter into runtime SUSPEND
> > mode, the driver will call free_irq() When it was not used(idle). Then
> > free_irq() will decrease the reference and let the irqchip enter
> > suspend state.
>
> The reference count on *what*? There is nothing to take a reference on. So yes,
> you understand how the core kernel works. But you don't seem to notice that
> there is no link between the irq and the device that implements the controller.

See the code, it will call irq_chip_pm_put(&desc->irq_data)

/*
* Internal function to unregister an irqaction - used to free
* regular and special interrupts that are part of the architecture.
*/
static struct irqaction *__free_irq(struct irq_desc *desc, void *dev_id)
{
..
irq_chip_pm_put(&desc->irq_data);
module_put(desc->owner);
kfree(action->secondary);
return action;
}

>
> > So, when the irqchip entered suspend, which means there is no user for
> > the irqchip and all the irqs were DISABLED + MASKED.
> > Due to the runtimePM support for the irqchip, when kexec runs, it will
> > sometimes meet the situation that the irqchip is suspend due to no
> > users for it. So from either the performance(time cost) or coding
> > logic, the machine_kexec_mask_interrupts() should not do double mask
> > for the irqs which already masked.
>
> I strongly suggest you start by fixing the damn driver first.

Our driver does not have issue at all. What to fix?

>
> In the meantime, NAK to this patch.

Anyway, it seems don't really understand this issue and you just simply reject one reasonable fix. Sounds not good at all.

>
> M.
> --
> Jazz is not dead. It just smells funny...