Re: [PATCH] driver core: platform: Rename platform_get_irq_optional() to platform_get_irq_silent()

From: Andy Shevchenko
Date: Tue Jan 25 2022 - 09:08:53 EST


On Tue, Jan 25, 2022 at 01:56:05PM +0100, Matthias Schiffer wrote:
> On Tue, 2022-01-25 at 09:25 +0100, Geert Uytterhoeven wrote:
> > On Mon, Jan 24, 2022 at 10:02 PM Sergey Shtylyov <s.shtylyov@xxxxxx>
> > wrote:
> > > On 1/24/22 6:01 PM, Andy Shevchenko wrote:

...

> > > > > > 2. The vIRQ0 handling: a) WARN() followed by b) returned
> > > > > > value 0
> > > > >
> > > > > I'm happy with the vIRQ0 handling. Today platform_get_irq() and
> > > > > it's
> > > > > silent variant returns either a valid and usuable irq number or
> > > > > a
> > > > > negative error value. That's totally fine.
> > > >
> > > > It might return 0.
> > > > Actually it seems that the WARN() can only be issued in two
> > > > cases:
> > > > - SPARC with vIRQ0 in one of the array member
> > > > - fallback to ACPI for GPIO IRQ resource with index 0
> > >
> > > You have probably missed the recent discovery that
> > > arch/sh/boards/board-aps4*.c
> > > causes IRQ0 to be passed as a direct IRQ resource?
> >
> > So far no one reported seeing the big fat warning ;-)
>
> FWIW, we had a similar issue with an IRQ resource passed from the
> tqmx86 MFD driver do the GPIO driver, which we noticed due to this
> warning, and which was fixed
> in a946506c48f3bd09363c9d2b0a178e55733bcbb6
> and 9b87f43537acfa24b95c236beba0f45901356eb2.

No, it's not, unfortunately :-( You just band aided the warning issue, but the
root cause is the WARN() and possibility to see valid (v)IRQ0 in the resources.
See below.

> I believe these changes are what promted this whole discussion and led
> to my "Reported-by" on the patch?
>
> It is not entirely clear to me when IRQ 0 is valid and when it isn't,
> but the warning seems useful to me. Maybe it would make more sense to
> warn when such an IRQ resource is registered for a platform device, and
> not when it is looked up?
>
> My opinion is that it would be very confusing if there are any places
> in the kernel (on some platforms) where IRQ 0 is valid,

And those places are board files like yours :( They have to be fixed
eventually. Ideally by using IRQ domains. At least that's how it's
done elsewhere.

> but for
> platform_get_irq() it would suddenly mean "not found". Keeping a
> negative return value seems preferable to me for this reason.

IRQ 0 is valid, vIRQ0 (or read it as cookie) is not.

Now, the problem in your case is that you are talking about board files, while
ACPI and DT never gives resource with vIRQ0. For board files some (legacy) code
decides that it's fine to supply HW IRQ, while the de facto case is that
platform_get_resource() returns whatever is in the resource, while
platform_get_irq() should return a cookie.

> (An alternative, more involved idea would be to add 1 to all IRQ
> "cookies", so IRQ 0 would return 1, leaving 0 as a special value. I
> have absolutely no idea how big the API surface is that would need
> changes, and it is likely not worth the effort at all.)

This is what IRQ domains do, they start vIRQs from 1.

> > > > The bottom line here is the SPARC case. Anybody familiar with the
> > > > platform
> > > > can shed a light on this. If there is no such case, we may remove
> > > > warning
> > > > along with ret = 0 case from platfrom_get_irq().
> > >
> > > I'm afraid you're too fast here... :-)
> > > We'll have a really hard time if we continue to allow IRQ0 to be
> > > returned by
> > > platform_get_irq() -- we'll have oto fileter it out in the callers
> > > then...
> >
> > So far no one reported seeing the big fat warning?
> >
> > > > > > 3. The specific cookie for "IRQ not found, while no error
> > > > > > happened" case
> > > > >
> > > > > Not sure what you mean here. I have no problem that a situation
> > > > > I can
> > > > > cope with is called an error for the query function. I just do
> > > > > error
> > > > > handling and continue happily. So the part "while no error
> > > > > happened" is
> > > > > irrelevant to me.
> > > >
> > > > I meant that instead of using special error code, 0 is very much
> > > > good for
> > > > the cases when IRQ is not found. It allows to distinguish -ENXIO
> > > > from the
> > > > low layer from -ENXIO with this magic meaning.
> > >
> > > I don't see how -ENXIO can trickle from the lower layers,
> > > frankly...
> >
> > It might one day, leading to very hard to track bugs.
>
> As gregkh noted, changing the return value without also making the
> compile fail will be a huge PITA whenever driver patches are back- or
> forward-ported, as it would require subtle changes in error paths,
> which can easily slip through unnoticed, in particular with half-
> automated stable backports.

Let's not modify kernel at all then, because in many cases it is a PITA
for back- or forward-porting :-)

> Even if another return value like -ENODEV might be better aligned with
> ...regulator_get_optional() and similar functions, or we even find a
> way to make 0 usable for this, none of the proposed changes strike me
> as big enough a win to outweigh the churn caused by making such a
> change at all.

Yeah, let's continue to suffer from ugly interface and see more band aids
landing around...

--
With Best Regards,
Andy Shevchenko