Re: [PATCH v5 19/19] irqdomain: Switch to per-domain locking

From: Marc Zyngier
Date: Fri Feb 10 2023 - 10:07:09 EST


On Fri, 10 Feb 2023 12:57:40 +0000,
Johan Hovold <johan@xxxxxxxxxx> wrote:
>
> On Fri, Feb 10, 2023 at 11:38:58AM +0000, Marc Zyngier wrote:
> > On Fri, 10 Feb 2023 09:56:03 +0000,
> > Johan Hovold <johan@xxxxxxxxxx> wrote:
> > >
> > > On Thu, Feb 09, 2023 at 04:00:55PM +0000, Marc Zyngier wrote:
> > > > On Thu, 09 Feb 2023 13:23:23 +0000,
> > > > Johan Hovold <johan+linaro@xxxxxxxxxx> wrote:
>
> > > I went back and forth over that a bit, but decided to only use
> > > domain->root->mutex in paths that can be called for hierarchical
> > > domains (i.e. the "shared code paths" mentioned above).
> > >
> > > Using it in paths that are clearly only called for non-hierarchical
> > > domains where domain->root == domain felt a bit lazy.
> >
> > My concern here is that as this code gets further refactored, it may
> > become much harder to reason about what is the correct level of
> > locking.
>
> Yeah, that's conceivable.
>
> > > The counter argument is of course that using domain->root->lock allows
> > > people to think less about the code they are changing, but that's not
> > > necessarily always a good thing.
> >
> > Eventually, non-hierarchical domains should simply die and be replaced
> > with a single level hierarchy. Having a unified locking in place will
> > definitely make the required work clearer.
> >
> > > Also note that the lockdep asserts in the revmap helpers would catch
> > > anyone using domain->mutex where they should not (i.e. using
> > > domain->mutex for an hierarchical domain).
> >
> > Lockdep is great, but lockdep is a runtime thing. It doesn't help
> > reasoning about what gets locked when changing this code.
>
> Contributers are expected to test their changes with lockdep enabled,
> right?
>
> But sure, using root->domain->mutex throughout may prevent prevent
> people from getting this wrong.
>
> I'll update this for v6.
>
> > > > > @@ -1132,6 +1147,7 @@ struct irq_domain *irq_domain_create_hierarchy(struct irq_domain *parent,
> > > > > else
> > > > > domain = irq_domain_create_tree(fwnode, ops, host_data);
> > > > > if (domain) {
> > > > > + domain->root = parent->root;
> > > > > domain->parent = parent;
> > > > > domain->flags |= flags;
> > > >
> > > > So we still have a bug here, as we have published a domain that we
> > > > keep updating. A parallel probing could find it in the interval and do
> > > > something completely wrong.
> > >
> > > Indeed we do, even if device links should make this harder to hit these
> > > days.
> > >
> > > > Splitting the work would help, as per the following patch.
> > >
> > > Looks good to me. Do you want to submit that as a patch that I'll rebase
> > > on or should I submit it as part of a v6?
> >
> > Just take it directly.
>
> Ok, thanks.
>
> I guess this turns the "Use irq_domain_create_hierarchy()" patches into
> fixes that should be backported as well.

Maybe. Backports are not my immediate concern.

> But note that your proposed diff may not be sufficient to prevent
> lookups from racing with domain registration generally. Many drivers
> still update the bus token after the domain has been added (and
> apparently some still set flags also after creating hierarchies I just
> noticed, e.g. amd_iommu_create_irq_domain).

The bus token should only rarely be a problem, as it is often set on
an intermediate level which isn't directly looked-up by anything else.
And if it did happen, it would probably result in a the domain not
being found.

Flags, on the other hand, are more problematic. But I consider this a
driver bug which should be fixed independently.

> It seems we'd need to expose a separate allocation and registration
> interface, or at least pass in the bus token to a new combined
> interface.

Potentially, yes. But this could come later down the line. I'm more
concerned in getting this series into -next, as the merge window is
fast approaching.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.