Re: [PATCH v5 19/19] irqdomain: Switch to per-domain locking

From: Marc Zyngier
Date: Sat Feb 11 2023 - 07:52:33 EST


AOn Sat, 11 Feb 2023 11:35:32 +0000,
Johan Hovold <johan@xxxxxxxxxx> wrote:
>
> On Fri, Feb 10, 2023 at 03:06:37PM +0000, Marc Zyngier wrote:
> > On Fri, 10 Feb 2023 12:57:40 +0000,
> > Johan Hovold <johan@xxxxxxxxxx> wrote:
> > >
> > > On Fri, Feb 10, 2023 at 11:38:58AM +0000, Marc Zyngier wrote:
> > > > On Fri, 10 Feb 2023 09:56:03 +0000,
> > > > Johan Hovold <johan@xxxxxxxxxx> wrote:
>
> > > > > > > @@ -1132,6 +1147,7 @@ struct irq_domain *irq_domain_create_hierarchy(struct irq_domain *parent,
> > > > > > > else
> > > > > > > domain = irq_domain_create_tree(fwnode, ops, host_data);
> > > > > > > if (domain) {
> > > > > > > + domain->root = parent->root;
> > > > > > > domain->parent = parent;
> > > > > > > domain->flags |= flags;
> > > > > >
> > > > > > So we still have a bug here, as we have published a domain that we
> > > > > > keep updating. A parallel probing could find it in the interval and do
> > > > > > something completely wrong.
> > > > >
> > > > > Indeed we do, even if device links should make this harder to hit these
> > > > > days.
> > > > >
> > > > > > Splitting the work would help, as per the following patch.
> > > > >
> > > > > Looks good to me. Do you want to submit that as a patch that I'll rebase
> > > > > on or should I submit it as part of a v6?
> > > >
> > > > Just take it directly.
> > >
> > > Ok, thanks.
>
> I've added a commit message and turned it into a patch to include in v6
> now:
>
> commit 3af395aa894c7df94ef2337e572e5e1710b4bbda (HEAD -> work)
> Author: Marc Zyngier <maz@xxxxxxxxxx>
> Date: Thu Feb 9 16:00:55 2023 +0000
>
> irqdomain: Fix domain registration race
>
> Hierarchical domains created using irq_domain_create_hierarchy() are
> currently added to the domain list before having been fully initialised.
>
> This specifically means that a racing allocation request might fail to
> allocate irq data for the inner domains of a hierarchy in case the
> parent domain pointer has not yet been set up.
>
> Note that this is not really any issue for irqchip drivers that are
> registered early via IRQCHIP_DECLARE() or IRQCHIP_ACPI_DECLARE(), but
> could potentially cause trouble with drivers that are registered later
> (e.g. when using IRQCHIP_PLATFORM_DRIVER_BEGIN(), gpiochip drivers,
> etc.).
>
> Fixes: afb7da83b9f4 ("irqdomain: Introduce helper function irq_domain_add_hierarchy()")
> Cc: stable@xxxxxxxxxxxxxxx # 3.19
> ...
> [ johan: add a commit message ]
> Signed-off-by: Johan Hovold <johan+linaro@xxxxxxxxxx>
>
> Could you just give your SoB for the diff here so I can credit you as
> author?

Thanks for that. Feel free to add:

Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx>

>
> > > I guess this turns the "Use irq_domain_create_hierarchy()" patches into
> > > fixes that should be backported as well.
> >
> > Maybe. Backports are not my immediate concern.
>
> Turns out all of those drivers are registered early via
> IRQCHIP_DECLARE() or IRQCHIP_ACPI_DECLARE() so there shouldn't really be
> any risk of hitting this race for those.
>
> > > But note that your proposed diff may not be sufficient to prevent
> > > lookups from racing with domain registration generally. Many drivers
> > > still update the bus token after the domain has been added (and
> > > apparently some still set flags also after creating hierarchies I just
> > > noticed, e.g. amd_iommu_create_irq_domain).
> >
> > The bus token should only rarely be a problem, as it is often set on
> > an intermediate level which isn't directly looked-up by anything else.
> > And if it did happen, it would probably result in a the domain not
> > being found.
> >
> > Flags, on the other hand, are more problematic. But I consider this a
> > driver bug which should be fixed independently.
>
> I agree.
>
> > > It seems we'd need to expose a separate allocation and registration
> > > interface, or at least pass in the bus token to a new combined
> > > interface.
> >
> > Potentially, yes. But this could come later down the line. I'm more
> > concerned in getting this series into -next, as the merge window is
> > fast approaching.
>
> I'll post a v6 first thing Monday if you can give me that SoB before
> then.

You should be all set. Please post the series at your earliest
convenience, and I'll let i simmer in -next for a bit.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.