Re: [PATCH 1/1] arm64: kexec: no need to do irq_chip->irq_mask if it already masked

From: Marc Zyngier
Date: Thu Aug 06 2020 - 13:42:58 EST


On 2020-08-06 11:05, Jason Liu wrote:
-----Original Message-----
From: Marc Zyngier <maz@xxxxxxxxxx>

[...]

> No, this patch is not papering over a much deeper issue in the driver.
> This is just to make things better for the ARM64 kexec.

Yes, I'm sure it is... However:

request_irq()
<goes into suspend, panic somewhere after having turned the irqchip clock off>
if (chip->irq_mask && !irqd_irq_masked(&desc->irq_data))
<explodes, as the interrupt isn't masked>

This is because the PM in the irqsteer driver is completely busted:
request_irq() should get a reference on the driver to prevent it from being
suspended. Since you don't implement it correctly, this doesn't happen and
your "improvement" doesn't help at all.

The request_irq will get a reference to prevent the irqchip from being
suspended due to it call
irq_chip_pm_get(). I am pretty sure we have implemented correctly and
that is also the common Linux code.

Then it seems you cannot read your own driver. At no point do you
set the parent_device that would give you a fighting chance to
get the device clocked and powered on. How does it work? Magic?

In order to save power and let the irqchip enter into runtime SUSPEND
mode, the driver will call free_irq()
When it was not used(idle). Then free_irq() will decrease the
reference and let the irqchip enter suspend state.

The reference count on *what*? There is nothing to take a reference
on. So yes, you understand how the core kernel works. But you don't
seem to notice that there is no link between the irq and the device
that implements the controller.

So, when the irqchip entered suspend, which means there is no user for
the irqchip and all the irqs were DISABLED + MASKED.
Due to the runtimePM support for the irqchip, when kexec runs, it will
sometimes meet the situation that the irqchip is suspend due to
no users for it. So from either the performance(time cost) or coding
logic, the machine_kexec_mask_interrupts() should not do
double mask for the irqs which already masked.

I strongly suggest you start by fixing the damn driver first.

In the meantime, NAK to this patch.

M.
--
Jazz is not dead. It just smells funny...