Re: [PATCH 04/15] irqchip/gic: WARN if setting the interrupt type fails

From: Marc Zyngier
Date: Mon Apr 11 2016 - 11:39:31 EST


On 11/04/16 16:31, Jon Hunter wrote:
> Hi Mark,
>
> On 09/04/16 11:58, Marc Zyngier wrote:
>> On Thu, 17 Mar 2016 15:04:01 +0000
>> Jon Hunter <jonathanh@xxxxxxxxxx> wrote:
>>
>>>
>>> On 17/03/16 14:51, Thomas Gleixner wrote:
>>>> On Thu, 17 Mar 2016, Jon Hunter wrote:
>>>>
>>>>> Setting the interrupt type for private peripheral interrupts (PPIs) may
>>>>> not be supported by a given GIC because it is IMPLEMENTATION DEFINED
>>>>> whether this is allowed. There is no way to know if setting the type is
>>>>> supported for a given GIC and so the value written is read back to
>>>>> verify it matches the desired configuration. If it does not match then
>>>>> an error is return.
>>>>>
>>>>> There are cases where the interrupt configuration read from firmware
>>>>> (such as a device-tree blob), has been incorrect and hence
>>>>> gic_configure_irq() has returned an error. This error has gone
>>>>> undetected because the error code returned was ignored but the interrupt
>>>>> still worked fine because the configuration for the interrupt could not
>>>>> be overwritten.
>>>>>
>>>>> Given that this has done undetected and we should only fail to set the
>>>>> type for PPIs whose configuration cannot be changed anyway, don't return
>>>>> an error and simply WARN if this fails. This will allows us to fix up any
>>>>> places in the kernel where we should be checking the return status and
>>>>> maintain back compatibility with firmware images that may have incorrect
>>>>> interrupt configurations.
>>>>
>>>> Though silently returning 0 is really the wrong thing to do. You can add the
>>>> warn, but why do you want to return success?
>>>
>>> Yes that would be the correct thing to do I agree. However, the problem
>>> is that if we do this, then after the patch "irqdomain: Don't set type
>>> when mapping an IRQ" is applied, we may break interrupts for some
>>> existing device-tree binaries that have bad configuration (such as omap4
>>> and tegra20/30 ... see patches 1 and 2) that have gone unnoticed. So it
>>> is a back compatibility issue.
>>>
>>> If you are wondering why these interrupts break after "irqdomain: Don't
>>> set type when mapping an IRQ", it is because today
>>> irq_create_fwspec_mapping() does not check the return code from setting
>>> the type, but if we defer setting the type until __setup_irq() which
>>> does check the return code, then all of a sudden interrupts that were
>>> working (even with bad configurations) start to fail.
>>>
>>> The reason why I opted not to return an error code from
>>> gic_configure_irq() is it really can't fail. The failure being reported
>>> does not prevent the interrupt from working, but tells you your
>>> configuration does not match the hardware setting which you cannot
>>> overwrite.
>>>
>>> So to maintain back compatibility and avoid any silent errors, I opted
>>> to make it a WARN and not return an error.
>>>
>>> If people are ok with potentially breaking interrupts for device-tree
>>> binaries with bad settings, then I am ok to return an error here.
>>
>> I think we need to phase things. Let's start with warning people for a
>> few kernel releases. Actively maintained platforms will quickly address
>> the issue (fixing their DT). As I see it, this issue seems rather
>> widespread (even kvmtool outputs a DT with the wrong triggering
>> information).
>>
>> Once we've fixed the bulk of the platforms and virtual environments, we
>> can start thinking about making it fail harder.
>
> Ok, so are you OK with this patch as-is? If so, can I add your ACK?

It depends where you plan to handle the error. Ideally, I'd keep on
returning the error (because that's the right thing to do), and move the
WARN_ON() into the core code. We'd keep on ignoring the error as we're
doing today, but we'd scream about it.

After a couple of releases, we'd turn the WARN_ON into a hard fail.

Thoughts?

M.
--
Jazz is not dead. It just smells funny...